{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5aa65d7d",
   "metadata": {},
   "source": [
    "# Prompt Optimization for Language Models with DSPy GEPA\n",
    "\n",
    "_Authored by: [Behrooz Azarkhalili](https://github.com/behroozazarkhalili)_\n",
    "\n",
    "This notebook demonstrates how to use [DSPy](https://dspy.ai/)'s GEPA (Generalized Error-driven Prompt Augmentation) optimizer to improve language model performance on mathematical reasoning tasks. We'll work with the [NuminaMath-1.5 dataset](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5) and show how GEPA can boost accuracy through automated prompt optimization.\n",
    "\n",
    "**What you'll learn:**\n",
    "- Setting up DSPy with language models ([OpenRouter](https://openrouter.ai/)) \n",
    "- Processing and filtering mathematical problem datasets\n",
    "- Building a baseline Chain-of-Thought reasoning program\n",
    "- Optimizing prompts with GEPA using error-driven feedback\n",
    "- Evaluating improvements in model accuracy\n",
    "\n",
    "\n",
    "GEPA works by analyzing errors, generating targeted feedback, and automatically refining prompts to address common failure patterns. This makes it particularly effective for complex reasoning tasks where prompt quality significantly impacts performance.\n",
    "\n",
    "**Key Resources:**\n",
    "- [DSPy Documentation](https://dspy.ai/learn/programming/)\n",
    "- [Chain-of-Thought Prompting Paper](https://arxiv.org/abs/2201.11903)\n",
    "- [GEPA Optimizer Guide](https://dspy.ai/api/optimizers/GEPA/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99b369f9",
   "metadata": {},
   "source": [
    "## Installation and Setup\n",
    "\n",
    "Install required dependencies and import libraries for DSPy, dataset processing, and model configuration.\n",
    "\n",
    "**Installation Options:**\n",
    "- **uv** - Fast Python package installer ([documentation](https://docs.astral.sh/uv/))\n",
    "- **pip** - Traditional Python package manager\n",
    "\n",
    "**Key Dependencies:**\n",
    "- `dspy` - DSPy framework for language model programming\n",
    "- `datasets` - Hugging Face datasets library for loading NuminaMath-1.5\n",
    "- `python-dotenv` - Environment variable management for API keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6lfe42g2q12",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install with uv (recommended - faster)\n",
    "!uv pip install dspy datasets python-dotenv\n",
    "\n",
    "# Alternative: Install with pip\n",
    "# !pip install dspy datasets python-dotenv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "72b0b006",
   "metadata": {},
   "outputs": [],
   "source": [
    "import dspy\n",
    "from datasets import load_dataset\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7050fb94",
   "metadata": {},
   "source": [
    "## Language Model Configuration\n",
    "\n",
    "Configure your language model - either local (Ollama) or cloud-based (OpenRouter) - for use with DSPy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6ff83c74",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dotenv import load_dotenv\n",
    "load_dotenv(\"../../.env\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af11aa10",
   "metadata": {},
   "source": [
    "### Model Selection Rationale\n",
    "\n",
    "**Main LM: `openrouter/openai/gpt-4.1-nano`**\n",
    "\n",
    "*Primary Role:* High-volume inference during baseline evaluation and GEPA optimization iterations\n",
    "\n",
    "*Key Selection Criteria:*\n",
    "1. **Cost Efficiency** - $0.10/M input tokens, $0.40/M output tokens (~90% cheaper than GPT-4.1 or Claude)\n",
    "2. **Low Latency** - Fastest GPT-4.1 variant, enables rapid iteration with 16-32 parallel threads\n",
    "3. **Adequate Performance** - 60-65% baseline accuracy (MMLU: 80.1%, GPQA: 50.3%)\n",
    "4. **Context Window** - 1M tokens for long chain-of-thought reasoning\n",
    "\n",
    "---\n",
    "\n",
    "**Reflection LM: `openrouter/qwen/qwen3-next-80b-a3b-thinking`**\n",
    "\n",
    "*Primary Role:* Deep error analysis and prompt improvement during GEPA's reflection phase\n",
    "\n",
    "*Key Selection Criteria:*\n",
    "1. **Advanced Reasoning** - \"Thinking\" variant specialized for analytical reasoning and pattern identification\n",
    "2. **Quality Over Speed** - ~16 reflection calls vs 2000+ inference calls, can afford slower, higher-quality model\n",
    "3. **Context Handling** - 10M token context window for processing multiple training examples\n",
    "4. **Cost Trade-off** - More expensive per token but negligible total cost due to low volume\n",
    "\n",
    "**Architecture Philosophy:** Use a cheap, fast model for high-volume inference (99% of calls) and a smart, analytical model for low-volume reflection (1% of calls). This asymmetric design optimizes for both cost efficiency and learning quality."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2i6lg4sa0o",
   "metadata": {},
   "source": [
    "### Understanding GEPA's Two-Model Architecture\n",
    "\n",
    "GEPA's breakthrough innovation lies in its **dual-model approach** for reflective prompt optimization, which fundamentally differs from traditional single-model optimizers.\n",
    "\n",
    "**Why Two Models?**\n",
    "\n",
    "Traditional prompt optimizers rely on scalar metrics (accuracy scores) to guide improvements, essentially using trial-and-error without understanding *why* predictions fail. GEPA introduces a revolutionary approach by separating concerns:\n",
    "\n",
    "**1. Student LM (Inference Model)**\n",
    "- **Role**: Primary model that executes tasks and generates predictions\n",
    "- **Characteristics**: Fast, cost-efficient, handles high-volume inference\n",
    "- **Usage Pattern**: ~90-95% of all API calls during optimization\n",
    "- **In This Notebook**: `openrouter/openai/gpt-4.1-nano`\n",
    "\n",
    "**2. Reflection LM (Meta-Cognitive Model)**\n",
    "- **Role**: Analyzes failures, identifies patterns, and generates prompt improvements\n",
    "- **Characteristics**: Stronger reasoning, analytical depth, interpretability\n",
    "- **Usage Pattern**: ~5-10% of API calls (only during reflection phases)\n",
    "- **In This Notebook**: `openrouter/qwen/qwen3-next-80b-a3b-thinking`\n",
    "\n",
    "**The Reflective Optimization Cycle:**\n",
    "\n",
    "```\n",
    "1. Student LM solves training problems → predictions\n",
    "2. Metric provides rich textual feedback on failures\n",
    "3. Reflection LM analyzes batches of failures → identifies patterns\n",
    "4. Reflection LM generates improved prompt instructions\n",
    "5. Student LM tests new prompts → validation\n",
    "6. Repeat until convergence\n",
    "```\n",
    "\n",
    "**Research Foundation:**\n",
    "\n",
    "This approach is detailed in the paper [\"Reflective Prompt Evolution Can Outperform Reinforcement Learning\"](https://arxiv.org/abs/2507.19457), which demonstrates that reflective optimization with textual feedback outperforms reinforcement learning approaches on complex reasoning tasks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4a30103e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✅ OpenRouter LM configured successfully!\n",
      "Main model: openrouter/openai/gpt-4.1-nano\n",
      "Reflection model: openrouter/qwen/qwen3-next-80b-a3b-thinking\n"
     ]
    }
   ],
   "source": [
    "# ============================================\n",
    "# OpenRouter Language Model Configuration\n",
    "# ============================================\n",
    "# Requires OPENROUTER_API_KEY environment variable\n",
    "# Sign up at https://openrouter.ai/ to get your API key\n",
    "\n",
    "# # Main LM for inference\n",
    "open_router_lm = dspy.LM(\n",
    "    'openrouter/openai/gpt-4.1-nano', \n",
    "    api_key=os.getenv('OPENROUTER_API_KEY'), \n",
    "    api_base='https://openrouter.ai/api/v1',\n",
    "    max_tokens=65536,\n",
    "    temperature=1.0\n",
    ")\n",
    "\n",
    "# # Reflection LM for GEPA optimization\n",
    "reflection_lm = dspy.LM(\n",
    "    'openrouter/qwen/qwen3-next-80b-a3b-thinking', \n",
    "    api_key=os.getenv('OPENROUTER_API_KEY'), \n",
    "    api_base='https://openrouter.ai/api/v1',\n",
    "    max_tokens=65536,\n",
    "    temperature=1.0\n",
    ")\n",
    "\n",
    "# Set OpenRouter as default LM\n",
    "dspy.configure(lm=open_router_lm)\n",
    "\n",
    "print(\"✅ OpenRouter LM configured successfully!\")\n",
    "print(f\"Main model: openrouter/openai/gpt-4.1-nano\")\n",
    "print(f\"Reflection model: openrouter/qwen/qwen3-next-80b-a3b-thinking\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1966f085",
   "metadata": {},
   "source": [
    "## Dataset Preparation Functions\n",
    "\n",
    "Helper functions to process the dataset, split it into train/val/test sets, and preview examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0fa8153",
   "metadata": {},
   "outputs": [],
   "source": [
    "def init_dataset(\n",
    "    train_split_ratio: float = None, \n",
    "    test_split_ratio: float = None, \n",
    "    val_split_ratio: float = None, \n",
    "    sample_fraction: float = 1.0\n",
    ") -> tuple[list, list, list]:\n",
    "    \"\"\"\n",
    "    Initialize and split the NuminaMath-1.5 dataset into train/val/test sets.\n",
    "    \n",
    "    Loads the dataset, filters for numeric answers, converts to DSPy Examples,\n",
    "    shuffles with fixed seed for reproducibility, and optionally samples a fraction.\n",
    "    \n",
    "    Args:\n",
    "        train_split_ratio: Proportion for training (default: 0.5)\n",
    "        test_split_ratio: Proportion for testing (default: 0.45)\n",
    "        val_split_ratio: Proportion for validation (default: 0.05)\n",
    "        sample_fraction: Fraction of dataset to use (default: 1.0 = full dataset)\n",
    "    \n",
    "    Returns:\n",
    "        Tuple of (train_set, val_set, test_set) as lists of DSPy Examples\n",
    "    \n",
    "    Raises:\n",
    "        AssertionError: If split ratios don't sum to 1.0\n",
    "    \"\"\"\n",
    "    # Set default split ratios\n",
    "    if train_split_ratio is None:\n",
    "        train_split_ratio = 0.5\n",
    "    if test_split_ratio is None:\n",
    "        test_split_ratio = 0.4\n",
    "    if val_split_ratio is None:\n",
    "        val_split_ratio = 0.1\n",
    "    \n",
    "    # Validate split ratios sum to 1.0\n",
    "    assert (train_split_ratio + test_split_ratio + val_split_ratio) == 1.0, \"Ratios must sum to 1.0\"\n",
    "\n",
    "    # Load dataset from Hugging Face Hub\n",
    "    train_split = load_dataset(\"AI-MO/NuminaMath-1.5\")['train']\n",
    "    \n",
    "    # Convert to DSPy Examples with input/output fields\n",
    "    train_split = [\n",
    "        dspy.Example({\n",
    "            \"problem\": x['problem'],\n",
    "            'solution': x['solution'],\n",
    "            'answer': x['answer'],\n",
    "        }).with_inputs(\"problem\")  # Mark 'problem' as input field\n",
    "        for x in train_split\n",
    "    ]\n",
    "    \n",
    "    # Shuffle with fixed seed for reproducibility\n",
    "    import random\n",
    "    random.Random(0).shuffle(train_split)\n",
    "    tot_num = len(train_split)\n",
    "    print(f\"Total number of examples after filtering: {tot_num}\")\n",
    "\n",
    "    # Apply sampling if requested\n",
    "    if sample_fraction < 1.0:\n",
    "        sample_num = int(tot_num * sample_fraction)\n",
    "        train_split = train_split[:sample_num]\n",
    "        tot_num = sample_num\n",
    "        print(f\"Sampled down to {sample_num} examples.\")\n",
    "    \n",
    "    # Split into train/val/test based on ratios\n",
    "    train_end = int(train_split_ratio * tot_num)\n",
    "    val_end = int((train_split_ratio + val_split_ratio) * tot_num)\n",
    "    \n",
    "    train_set = train_split[:train_end]\n",
    "    val_set = train_split[train_end:val_end]\n",
    "    test_set = train_split[val_end:]\n",
    "\n",
    "    return train_set, val_set, test_set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "cce7ec2b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of examples after filtering: 896215\n",
      "Sampled down to 224 examples.\n",
      "112 22 90\n"
     ]
    }
   ],
   "source": [
    "train_set, val_set, test_set = init_dataset(sample_fraction=0.00025)\n",
    "\n",
    "print(len(train_set), len(val_set), len(test_set))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "ee4324ab",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Problem:\n",
      "In the diagram, $AB = 15\\text{ cm},$ $DC = 24\\text{ cm},$ and $AD = 9\\text{ cm}.$ What is the length of $AC,$ to the nearest tenth of a centimeter?\n",
      "\n",
      "[asy]\n",
      "draw((0,0)--(9,16)--(33,16)--(9,0)--cycle,black+linewidth(1));\n",
      "draw((9,16)--(9,0),black+linewidth(1));\n",
      "draw((0,0)--(33,16),black+linewidth(1));\n",
      "draw((9,0)--(9,0.5)--(8.5,0.5)--(8.5,0)--cycle,black+linewidth(1));\n",
      "draw((9,16)--(9.5,16)--(9.5,15.5)--(9,15.5)--cycle,black+linewidth(1));\n",
      "label(\"$A$\",(0,0),NW);\n",
      "label(\"$B$\",(9,16),NW);\n",
      "label(\"$C$\",(33,16),E);\n",
      "label(\"$D$\",(9,0),SE);\n",
      "label(\"15 cm\",(0,0)--(9,16),NW);\n",
      "label(\"9 cm\",(0,0)--(9,0),S);\n",
      "label(\"24 cm\",(9,0)--(33,16),SE);\n",
      "[/asy]\n",
      "\n",
      "\n",
      "Solution:\n",
      "Extend $AD$ to point $E$ where it intersects the perpendicular from $C$ on $BC$'s extension.\n",
      "\n",
      "[asy]\n",
      "draw((0,0)--(9,16)--(33,16)--(9,0)--cycle,black+linewidth(1));\n",
      "draw((9,16)--(9,0),black+linewidth(1));\n",
      "draw((0,0)--(33,16),black+linewidth(1));\n",
      "draw((9,0)--(9,0.5)--(8.5,0.5)--(8.5,0)--cycle,black+linewidth(1));\n",
      "draw((9,16)--(9.5,16)--(9.5,15.5)--(9,15.5)--cycle,black+linewidth(1));\n",
      "label(\"$A$\",(0,0),NW);\n",
      "label(\"$B$\",(9,16),NW);\n",
      "label(\"$C$\",(33,16),E);\n",
      "label(\"$D$\",(9,0),SE);\n",
      "draw((9,0)--(33,0),black+linewidth(1)+dashed);\n",
      "draw((33,0)--(33,16),black+linewidth(1)+dashed);\n",
      "draw((33,0)--(33,0.5)--(32.5,0.5)--(32.5,0)--cycle,black+linewidth(1));\n",
      "label(\"$E$\",(33,0),SE);\n",
      "label(\"18 cm\",(9,0)--(33,0),S);\n",
      "label(\"16 cm\",(33,0)--(33,16),E);\n",
      "[/asy]\n",
      "\n",
      "Using the Pythagorean theorem in $\\triangle ADB$, calculate $BD^2 = BA^2 - AD^2 = 15^2 - 9^2 = 144$, so $BD = 12\\text{ cm}$.\n",
      "\n",
      "In $\\triangle DBC$, compute $BC^2 = DC^2 - BD^2 = 24^2 - 12^2 = 432$, thus $BC = 18\\text{ cm}$.\n",
      "\n",
      "Recognize $BCED$ as a rectangle, hence $DE = BC = 18\\text{ cm}$ and $CE = BD = 12\\text{ cm}$.\n",
      "\n",
      "Examine $\\triangle AEC$ with $AE = AD + DE = 9 + 18 = 27\\text{ cm}$, then apply Pythagorean theorem:\n",
      "\\[ AC^2 = AE^2 + CE^2 = 27^2 + 12^2 = 729 + 144 = 873 \\]\n",
      "\\[ AC = \\sqrt{873} \\approx \\boxed{29.5\\text{ cm}} \\]\n",
      "\n",
      "\n",
      "Answer:\n",
      "29.5\\text{ cm}\n"
     ]
    }
   ],
   "source": [
    "print(\"Problem:\")\n",
    "print(train_set[0]['problem'])\n",
    "print(\"\\n\\nSolution:\")\n",
    "print(train_set[0]['solution'])\n",
    "print(\"\\n\\nAnswer:\")\n",
    "print(train_set[0]['answer'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "d89019c0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a cistern is two - third full of water . pipe a can fill the remaining part in 12 minutes and pipe b in 8 minutes . once the cistern is emptied , how much time will they take to fill it together completely ?\n",
      "\n",
      "\n",
      "Answer:\n",
      "14.4\n"
     ]
    }
   ],
   "source": [
    "print(test_set[0]['problem'])\n",
    "print(\"\\n\\nAnswer:\")\n",
    "print(test_set[0]['answer'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3659214d",
   "metadata": {},
   "source": [
    "## Baseline Chain-of-Thought Program\n",
    "\n",
    "Create a simple baseline using DSPy's Chain-of-Thought module to establish initial performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "8a885ac5",
   "metadata": {},
   "outputs": [],
   "source": [
    "class GenerateResponse(dspy.Signature):\n",
    "    \"\"\"Solve the problem and provide the answer in the correct format.\"\"\"\n",
    "    problem = dspy.InputField()\n",
    "    answer = dspy.OutputField()\n",
    "\n",
    "program = dspy.ChainOfThought(GenerateResponse)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a5ee6de",
   "metadata": {},
   "source": [
    "## Evaluation Metric\n",
    "\n",
    "Define the evaluation metric to compare model predictions against ground truth answers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "11b652f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "def parse_integer_answer(answer):\n",
    "    try:\n",
    "        # find the last token that has a number in it\n",
    "        answer = [token for token in answer.split() if any(c.isdigit() for c in token)][-1]\n",
    "        answer = answer.split(\".\")[0]\n",
    "        answer = \"\".join([c for c in answer if c.isdigit()])\n",
    "        answer = int(answer)\n",
    "\n",
    "    except (ValueError, IndexError, TypeError):\n",
    "        answer = 0\n",
    "\n",
    "    return answer\n",
    "\n",
    "def metric(gold, pred, trace=None):\n",
    "    return int(parse_integer_answer(str(gold.answer))) == int(parse_integer_answer(str(pred.answer)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07134dea",
   "metadata": {},
   "source": [
    "## Baseline Evaluation\n",
    "\n",
    "Evaluate the baseline Chain-of-Thought program to establish our starting accuracy before optimization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "0cc4aef2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0%|          | 0/90 [00:00<?, ?it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 35.00 / 59 (59.3%):  64%|██████▍   | 58/90 [00:25<00:16,  1.89it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/10/04 20:23:05 WARNING dspy.adapters.json_adapter: Failed to use structured output format, falling back to JSON mode.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 47.00 / 90 (52.2%): 100%|██████████| 90/90 [00:45<00:00,  1.98it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/10/04 20:23:25 INFO dspy.evaluate.evaluate: Average Metric: 47 / 90 (52.2%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "problem",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "solution",
         "rawType": "object",
         "type": "unknown"
        },
        {
         "name": "example_answer",
         "rawType": "object",
         "type": "unknown"
        },
        {
         "name": "reasoning",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "pred_answer",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "metric",
         "rawType": "object",
         "type": "string"
        }
       ],
       "ref": "6b82d858-390d-4984-98cf-cb75ede8386c",
       "rows": [
        [
         "0",
         "a cistern is two - third full of water . pipe a can fill the remaining part in 12 minutes and pipe b in 8...",
         "First, let's find out how much time it would take for each pipe to fill the cistern completely if it were empty. If Pipe A...",
         "14.4",
         "The cistern is initially two-thirds full, so the remaining part to fill is one-third. Pipe A can fill the remaining one-third in 12 minutes, and...",
         "4.8 minutes",
         ""
        ],
        [
         "1",
         "In the geometric sequence $\\{a_n\\}$ with common ratio $q > 1$, we have $a_2 = 2$ and the sum of the first three terms $S_3...",
         "(1) Since $a_2 = 2$ and $S_3 = 7$ with $q > 1$, We have the system of equations: $\\begin{cases} a_1q = 2 \\\\ a_1(1...",
         "\\frac{n}{n+1}",
         "Given that \\(\\{a_n\\}\\) is a geometric sequence with ratio \\(q > 1\\), we can write the general term as \\(a_n = a_1 q^{n-1}\\). From the...",
         "(1) The general term formula for the sequence \\(\\{a_n\\}\\) is \\(\\boxed{a_n = 2^{n-1}}\\). (2) The sum of the first \\(n\\) terms of \\(\\{c_n\\}\\) is \\(\\boxed{T_n...",
         "✔️ [True]"
        ],
        [
         "2",
         "Given that the derivative of the function $f(x)$ is $f'(x)$, and it satisfies the relationship $f(x) = x^{2} + 3xf'(2) + \\ln x$, find the...",
         "Given that the derivative of the function $f(x)$ is $f'(x)$, and it satisfies the relationship $f(x) = x^{2} + 3xf'(2) + \\ln x$, we want...",
         "-\\dfrac{9}{4}",
         "The function given is \\[f(x) = x^2 + 3x f'(2) + \\ln x.\\] Note that \\(f'(x)\\) is the derivative of \\(f(x)\\). We can differentiate both...",
         "\\(\\boxed{-\\frac{9}{4}}\\)",
         "✔️ [True]"
        ],
        [
         "3",
         "67. Given a point $A$ and a line $l$, $B$ is an arbitrary point on $l$. Find the geometric locus of points $M$ such that...",
         "67. The desired geometric locus of points consists of two lines passing through the point symmetric to point $A$ with respect to line $l$, and...",
         "notfound",
         "Given a fixed point \\(A\\) and a line \\(l\\), and an arbitrary point \\(B\\) on \\(l\\), we consider the equilateral triangle \\(ABM\\). For each position...",
         "The locus of points \\(M\\) such that \\(ABM\\) is an equilateral triangle, with \\(A\\) fixed and \\(B\\) varying along \\(l\\), is the union of two...",
         ""
        ],
        [
         "4",
         "A unit has a total of 620 staff members. To investigate the time workers spend commuting, it was decided to survey 62 workers using a...",
         "**Analysis** This question examines systematic sampling, which is key to understanding that systematic sampling involves sampling at equal intervals, forming an arithmetic sequence. This is...",
         "394",
         "The total staff is 620, divided into 62 segments, which implies each segment has 10 members (since 620 ÷ 62 = 10). The sampling starts...",
         "394",
         "✔️ [True]"
        ],
        [
         "5",
         "6. Given a prime $p \\equiv 3(\\bmod 4)$. For an integer sequence of length no more than $p-1$ composed of $\\pm 1, \\pm 2, \\cdots,...",
         "6. For $1 \\leqslant k \\leqslant \\frac{p-1}{2}$, the number of balanced sequences with $2k$ terms is $$ \\begin{array}{l} \\mathrm{C}_{2 k}^{k}\\left(\\frac{p-1}{2}\\right)^{k}\\left(\\frac{p-1}{2}\\right)^{k}=\\mathrm{C}_{2 k}^{k}\\left(\\frac{p-1}{2}\\right)^{2 k} . \\\\ \\text...",
         "proof",
         "We are given a prime \\( p \\equiv 3 \\pmod{4} \\), and we consider sequences of length at most \\( p-1 \\), composed of elements...",
         "The number of balanced sequences \\( M_p \\) equals the central binomial coefficient \\(\\binom{p-1}{\\frac{p-1}{2}}\\). For primes \\( p \\equiv 3 \\pmod{4} \\), this binomial coefficient...",
         ""
        ],
        [
         "6",
         "Given the function $f(x)=a\\ln x+ \\frac {1}{2}x^{2}+(a+1)x+1$. $(1)$ When $a=-1$, find the interval of monotonic increase for the function $f(x)$; $(2)$ If the function $f(x)$...",
         "Solution: $(1)$ When $a=-1$, $f(x)=-\\ln x+ \\frac {1}{2}x^{2}+1$. Then $f′(x)=- \\frac {1}{x}+x$. Let $f′(x) > 0$, we get $- \\frac {1}{x}+x > 0$, which is...",
         "3-2 \\sqrt {2}",
         "Let's analyze the function \\(f(x) = a \\ln x + \\frac{1}{2} x^2 + (a+1) x + 1\\). ### Part (1) When \\(a=-1\\), \\(f(x) = -...",
         "(1) The function \\(f(x)\\) is increasing on \\([1, \\infty)\\). (2) The function \\(f(x)\\) is increasing on \\((0, \\infty)\\) when \\(a \\geq 0\\). (3) The minimum...",
         "✔️ [True]"
        ],
        [
         "7",
         "On a line, there are 2010 points. We perform the following operation: insert one point between every two adjacent points. After conducting this operation 3...",
         "Initially, with 2010 points on the line, there are 2009 segments between these points. After the first operation, we insert 2009 new points, resulting in...",
         "16073",
         "Initially, there are 2010 points on the line. Each operation involves inserting one point between every pair of adjacent points, which effectively doubles the number...",
         "16073",
         "✔️ [True]"
        ],
        [
         "8",
         "Jackie loves to climb trees. She climbed a 1000 foot tall tree. Then she climbed 2 trees that were a certain fraction of the height...",
         "Let's denote the height of the first tree as \\( H_1 \\) and the height of each of the two trees as \\( H_2 \\)....",
         "1:2",
         "Let's denote the height of the first tree as \\( h \\). Jackie then climbs two more trees that are each a certain fraction \\(...",
         "The ratio of the heights of the two trees to the first tree is \\(\\frac{2}{13}\\).",
         ""
        ],
        [
         "9",
         "From a group of eight students, how many different five-student committees can be chosen?",
         "We use the combination formula to solve this problem. The formula to calculate the number of ways to choose $k$ students from a group of...",
         "56",
         "To determine the number of different five-student committees that can be formed from a group of eight students, we need to find the number of...",
         "56",
         "✔️ [True]"
        ],
        [
         "10",
         "In $\\triangle ABC$, $BC= a$, $AC= b$, $AB = c$, ${{a}^{2}} -{{c}^{2}} = {{b}^{2}} - bc$, find the angle $A$.",
         "[Analysis] The main focus of this problem is to apply the cosine rule. [Solution] We have $\\cos A= \\frac{{b}^{2}+{c}^{2}-{a}^{2}}{2bc}= \\frac{bc}{2bc}= \\frac{1}{2}$ (using the provided equation...",
         "\\frac{π}{3}",
         "Given the sides of \\(\\triangle ABC\\): \\(BC = a\\), \\(AC = b\\), \\(AB = c\\). The given relation is: \\[a^2 - c^2 = b^2 -...",
         "\\(\\boxed{60^\\circ}\\)",
         ""
        ],
        [
         "11",
         "In the rectangular coordinate system, a polar coordinate system is established with the origin as the pole and the positive semi-axis of the $x$-axis as...",
         "(1) From ${\\cos }^{2}\\alpha+{\\sin }^{2}\\alpha=1$ we obtain the general equation of curve $C$ as: $$\\frac{y^{2}}{9}+ \\frac{x^{2}}{4}=1$$ Using $x=\\rho\\cos \\theta$ and $y=\\rho\\sin \\theta$ we find the...",
         "\\frac{27}{2}",
         "First, I will solve part (1). Part (1): (a) To find the rectangular coordinate equation of line \\( l \\), given by the polar form:...",
         "(1) The line in rectangular coordinates is \\( x - 2 y = 4 \\). The curve is \\( \\frac{x^2}{4} + \\frac{y^2}{9} = 1 \\)....",
         ""
        ],
        [
         "12",
         "I randomly pick an integer $p$ between $1$ and $15$ inclusive. What is the probability that I choose a $p$ such that there exists an...",
         "First, let's solve the equation $pq - 6p - 3q = 3$ by manipulating it similarly to the original problem: 1. Add 18 to both...",
         "\\frac{4}{15}",
         "We are given the equation \\( pq - 6p - 3q = 3 \\) and a range for \\( p \\) from 1 to 15....",
         "\\(\\boxed{\\frac{4}{15}}\\)",
         "✔️ [True]"
        ],
        [
         "13",
         "$29 \\cdot 5$ is formed by the consecutive two-digit integers from 19 to 92 arranged in sequence to form the integer $N=$ $19202122 \\cdots 909192$....",
         "[Solution] A natural number $n$ is divisible by 3 or 9 $\\Leftrightarrow$ the sum of its digits is divisible by 3 or 9. Consider $$...",
         "1",
         "The number \\(N\\) is formed by concatenating the integers from 19 to 92: \"192021...909192\". We need to find the highest power of 3 dividing \\(N\\),...",
         "1",
         "✔️ [True]"
        ],
        [
         "14",
         "Given an ellipse equation $\\Gamma$: $\\frac{x^2}{a^2}+\\frac{y^2}{b^2}=1\\left(a \\gt b \\gt 0\\right)$, a line $l: x+y-4\\sqrt{2}=0$, the lower end points of $\\Gamma$ are $A$, $M$ is on...",
         "### Solution: #### Part (1) Given $a=2$, we can find the semi-minor axis $b$ and the linear eccentricity $c$ as follows: - Since $a^2 =...",
         "d_{\\min} = \\frac{8}{3}",
         "For the second part, line l intersects the y-axis at B: when x=0, y=-4√2. Since line l: x + y - 4√2=0, the point B...",
         "2) The value of b is \u00168/5, or \u00160\u00167, which equals 1.2.",
         ""
        ],
        [
         "15",
         "Given the derivative of the function $f(x)$ is $f'(x)$, and it satisfies $f(x)=2xf'(1)+x^{2}$, then $f'(1)=$　　()\nA: $-1$\nB: $-2$\nC: $1$\nD: $2$",
         "Solution: $f'(x)=2f'(1)+2x$, By setting $x=1$, we get $f'(1)=2f'(1)+2$, $\\therefore f'(1)=-2$, Therefore, the correct answer is $\\boxed{\\text{B}}$. By using the rules of differentiation, we find $f'(x)$,...",
         "\\text{B}",
         "Given the function \\(f(x)\\) and its derivative \\(f'(x)\\), the problem states that \\[ f(x) = 2x f'(1) + x^2. \\] To find \\(f'(1)\\), differentiate both...",
         "-2",
         ""
        ],
        [
         "16",
         "If the function $f(x) = 2x^2 - \\ln x$ is not monotonous in a subinterval $(k - 1, k + 1)$ of its domain, then...",
         "1. First, let's find the domain of the function $f(x) = 2x^2 - \\ln x$. Since the natural logarithm $\\ln x$ is only defined for...",
         "1 < k < \\frac{3}{2}",
         "The function given is \\(f(x) = 2x^2 - \\ln x\\), which is defined for \\(x > 0\\). To determine when \\(f\\) is not monotonous in...",
         "B: \\(\\left[\\begin{matrix}1, \\frac{3}{2}\\end{matrix}\\right)\\)",
         "✔️ [True]"
        ],
        [
         "17",
         "Determine how many more digits does the base-4 representation of $1729_{10}$ have than the base-6 representation of $1729_{10}$?",
         "First, we compute the number of digits in the base-4 representation of $1729_{10}$: Given that $1729 1024 = 4^5$, the base-4 representation must use 6...",
         "1",
         "To find how many more digits the base-4 representation of 1729 is than the base-6 representation, we need to determine the number of digits in...",
         "1",
         "✔️ [True]"
        ],
        [
         "18",
         "10.",
         "A",
         "A",
         "The problem \"10.\" appears to be incomplete or possibly refers to a list or question numbered 10. Without additional context or specific instructions, the most...",
         "10",
         ""
        ],
        [
         "19",
         "The chord length cut by the line $y=x$ from the circle $x^2+(y+2)^2=4$ is (　　)\n\nA: $\\sqrt{2}$\n\nB: $2\\sqrt{2}$\n\nC: $\\sqrt{3}$\n\nD: $2\\sqrt{3}$",
         "The center of the circle $x^2+(y+2)^2=4$ is at $(0, -2)$, and its radius is 2. Since the distance from the center of the circle to...",
         "\\text{B}",
         "We are given the circle \\( x^2 + (y+2)^2 = 4 \\), which centers at \\( (0, -2) \\) with radius 2. The line cutting...",
         "\\( 2 \\sqrt{2} \\)",
         ""
        ],
        [
         "20",
         "If a farmer sells ducks for $10 each and chickens for $8 each, and he sells 5 chickens along with some ducks, he then uses...",
         "Let's denote the number of ducks the farmer sold as \\( d \\). First, we calculate the total revenue from selling the chickens and ducks:...",
         "8",
         "Let's define variables: - Let \\( D \\) = number of ducks sold - The farmer sells 5 chickens, each for $8, so total earnings...",
         "8",
         "✔️ [True]"
        ],
        [
         "21",
         "(3) Six families are sequentially numbered as $1, 2, 3, 4, 5, 6$. Each family has three members, and they all gather to play a...",
         "(3) From 18 people, selecting 3 people, the number of different ways to select is $\\mathrm{C}_{18}^{3}=\\frac{18 \\times 17 \\times 16}{3 \\times 2 \\times 1}=816$. Since...",
         "\\frac{45}{68}",
         "There are 6 families with 3 members each, totaling 18 members. The game is played in groups of 3, with one group formed at a...",
         "\\( \\boxed{\\frac{45}{68}} \\)",
         "✔️ [True]"
        ],
        [
         "22",
         "A supermarket has 2355 loaves of bread at the start of the day. By mid-morning, they sell 30% of their loaf stock, and a bakery...",
         "Let's calculate the number of loaves sold by mid-morning first: 30% of 2355 loaves = 0.30 * 2355 = 706.5 loaves Since we can't have...",
         "2888",
         "The supermarket starts with 2355 loaves. First, they sell 30% of their stock during mid-morning: - Loaves sold = 30% of 2355 = 0.30 ×...",
         "2664",
         ""
        ],
        [
         "23",
         "In the \"Love China\" high school speech competition, 6 judges gave the contestant Xiao Ming the following scores: 7, 9, 6, 7, 9, 8. What...",
         "To find the mode of the set of scores given to Xiao Ming in the \"Love China\" high school speech competition, we first list out...",
         "7 \\text{ and } 9",
         "The mode of a data set is the number that appears most frequently. In the scores given: 7, 9, 6, 7, 9, 8, we count...",
         "The modes of the data set are 7 and 9.",
         "✔️ [True]"
        ],
        [
         "24",
         "1.019. $\\frac{0.125: 0.25+1 \\frac{9}{16}: 2.5}{(10-22: 2.3) \\cdot 0.46+1.6}+\\left(\\frac{17}{20}+1.9\\right) \\cdot 0.5$.",
         "## Solution.\n\n$$\n\\begin{aligned}\n& \\frac{0.125: 0.25+1 \\frac{9}{16}: 2.5}{(10-22: 2.3): 0.46+1.6}+\\left(\\frac{17}{20}+1.9\\right) \\cdot 0.5=\\frac{\\frac{1}{2}+\\frac{5}{8}}{\\left(10-\\frac{220}{23}\\right) \\cdot \\frac{23}{50}+\\frac{8}{5}}+\\frac{17}{40}+\\frac{19}{20}= \\\\\n& =\\frac{\\frac{9}{8}}{\\frac{1}{5}+\\frac{8}{5}}+\\frac{17}{40}+\\frac{38}{40}=\\frac{5}{8}+\\frac{11}{8}=2\n\\end{aligned}\n$$\n\nAnswer: 2.",
         "2",
         "The problem appears to be a complex mathematical expression that needs to be carefully simplified step by step. First, identify and parse each part: Numerator:...",
         "2",
         "✔️ [True]"
        ],
        [
         "25",
         "If $f(x)=e^{-x}(\\cos x+\\sin x)$, then $f′(x)=$ ______.",
         "According to the problem, $f(x)=e^{-x}(\\cos x+\\sin x)= \\frac {\\cos x+\\sin x}{e^{x}}$, $f′(x)= \\frac {(\\cos x+\\sin x)′e^{x}-(\\cos x+\\sin x)\\cdot (e^{x})′}{e^{2x}}= \\frac {-2\\sin x}{e^{x}}=-2e^{-x}\\sin x$, The answer...",
         "-2e^{-x}\\sin x",
         "Given the function \\(f(x) = e^{-x}(\\cos x + \\sin x)\\). This is a product of two functions: \\(u(x) = e^{-x}\\) and \\(v(x) = \\cos x...",
         "\\(-2 e^{-x} \\sin x\\)",
         "✔️ [True]"
        ],
        [
         "26",
         "Joann rode her bike at an average speed of 16 miles per hour for three and a half hours. If her friend, Fran, rides for...",
         "First, let's calculate how far Joann traveled in her time biking: \\( \\text{Distance travelled by Joann} = 3.5 \\text{ hours} \\times 16 \\text{ mph} =...",
         "14",
         "First, find the distance Joann traveled by multiplying her speed by her time: 16 miles/hour * 3.5 hours = 56 miles. Fran needs to cover...",
         "14 miles per hour",
         "✔️ [True]"
        ],
        [
         "27",
         "Let $a$ and $b$ be the roots of the equation $x^2 - mx + 3 = 0$. Suppose that $a + \\frac{1}{b}$ and $b +...",
         "From Vieta's formulas for the first equation, we know $ab = 3$. Applying the new roots to the second equation, we need to find: \\[r...",
         "\\frac{16}{3}",
         "Given the quadratic equation \\(x^2 - mx + 3 = 0\\), its roots \\(a\\) and \\(b\\) satisfy Vieta's formulas: \\[ a + b = m...",
         "\\(\\boxed{\\frac{16}{3}}\\)",
         "✔️ [True]"
        ],
        [
         "28",
         "Given a triangle $ABC$, in which the medians $BE$ and $CF$ are perpendicular. Let $M$ is the intersection point of the medians of this triangle,...",
         "1. **Define Points and Properties:** - Let \\( D \\) be the midpoint of \\( BC \\). - Let \\( E \\) be the intersection...",
         null,
         "Given triangle \\( ABC \\), the medians are \\( BE \\) and \\( CF \\). Since \\( BE \\) and \\( CF \\) are medians,...",
         "In a triangle \\( ABC \\) where the medians \\( BE \\) and \\( CF \\) are perpendicular, the line \\( ML \\), connecting the...",
         "✔️ [True]"
        ],
        [
         "29",
         "The cave is some feet deep and they are already at 849 feet. If they are travelling at speed of 17, they have 369 feet...",
         "If they are already at 849 feet and they have 369 feet farther to go until they reach the end of the cave, we can...",
         "1218",
         "The total depth of the cave can be calculated by adding the distance already traveled to the remaining distance until the end of the cave....",
         "The depth of the cave is 1218 feet.",
         "✔️ [True]"
        ],
        [
         "30",
         "Pentagon ABCDE has a vertical line of symmetry. What is the $y$-coordinate of vertex C so that the area of the pentagon is 50 square...",
         "1. Calculate the area of square $ABDE$. Since each side of the square is $5$ units, its area is $5^2 = 25$ square units. 2....",
         "15",
         "The pentagon ABCDE has a vertical line of symmetry, which suggests that its shape is symmetric about a vertical axis, most likely passing through the...",
         "15",
         "✔️ [True]"
        ],
        [
         "31",
         "Let $n \\ge 5$ be an integer and let $T = \\{5,6,7,\\ldots,n\\}$. Find the smallest value of $n$ such that for every partition of $T$...",
         "We claim that $625$ is the minimal value of $n$. Let the two partitioned sets be $A$ and $B$; we will try to partition $5,...",
         "625",
         "We are given a set \\( T = \\{5, 6, 7, \\ldots, n\\} \\) for \\( n \\ge 5 \\). The goal is to find...",
         "11",
         ""
        ],
        [
         "32",
         "Shapovalov A.V. The Small and Large Islands have a rectangular shape and are divided into rectangular counties. In each county, a road is laid along...",
         "The image provides an example for nine counties.\n\n![](https://cdn.mathpix.com/cropped/2024_05_06_62351a5cb646b08ac6fbg-03.jpg?height=540&width=612&top_left_y=-1&top_left_x=729)\n\nSend a comment",
         "9",
         "The problem describes two islands formed by rectangular counties, each with roads along their diagonals, creating closed paths that do not cross or pass through...",
         "The Large Island has 5 counties arranged in a cycle forming a pentagon with roads laid along diagonals, creating a closed path without overlaps or...",
         ""
        ],
        [
         "33",
         "James takes 5 oranges and breaks each orange into 8 pieces. He splits the pieces between 4 people. If an orange has 80 calories, how...",
         "First, let's find out how many pieces of orange James has after breaking each one into 8 pieces. Since he has 5 oranges and breaks...",
         "100",
         "First, determine the total number of orange pieces James has. He starts with 5 oranges, and each orange is broken into 8 pieces, so the...",
         "Each person gets 100 calories.",
         "✔️ [True]"
        ],
        [
         "34",
         "Example 2 As shown in Figure 2, $AC$ and $CE$ are two diagonals of the regular hexagon $ABCDEF$. Points $M$ and $N$ internally divide $AC$...",
         "Solution: Let $\\overrightarrow{A C}=2 \\vec{a}, \\overrightarrow{A F}=2 \\vec{b}$, then $$ \\begin{array}{l} \\overrightarrow{A M}=r \\overrightarrow{A C}=2 r \\vec{a}, \\overrightarrow{A B}=\\vec{a}-\\vec{b} \\text {. } \\\\ \\text {...",
         "\\frac{\\sqrt{3}}{3}",
         "Let's position the regular hexagon \\( ABCDEF \\) conveniently for analysis. Place the hexagon in the coordinate plane with its center at the origin \\(...",
         "\\(\\boxed{\\frac{1}{\\sqrt{3}}}\\)",
         ""
        ],
        [
         "35",
         "in a group of 8 engineers , 3 engineers have a phd degree ; the others have only an ms or bs degree . a...",
         "To solve this problem, we can use combinations. We want to form teams of 3 engineers that include at least one engineer with a PhD....",
         "46",
         "There are 8 engineers in total, with 3 having a PhD and 5 with only an MS or BS degree. We need to select a...",
         "46",
         "✔️ [True]"
        ],
        [
         "36",
         "B2 Janez has built a fence around a rectangular garden that is fenced on three sides. The length of the fence is $60 \\mathrm{~m}$, and...",
         "B2. From the given data, we write the relations $2a + b = 60$ and $a \\cdot b = 352$. We solve the resulting system...",
         "22",
         "Let the length of the garden be \\(a\\) meters and the width be \\(b\\) meters, with \\(a > b\\). Since the garden is fenced on...",
         "The sides of the garden are 44 meters and 8 meters, with the longer side being 44 meters.",
         ""
        ],
        [
         "37",
         "A director first selects 2 leading actors from a list of 5 actors, which includes 2 actors who have won the Golden Rooster Award and...",
         "The probability is calculated as $\\frac{2}{5} \\times \\frac{1}{4} \\times 3 = \\boxed{\\frac{3}{10}}$.",
         "\\frac{3}{10}",
         "The total number of actors is 5, consisting of 2 Golden Rooster Award winners (G) and 3 Hundred Flowers Award winners (H). Step 1: Select...",
         "A: \\(\\boxed{\\frac{1}{3}}\\)",
         ""
        ],
        [
         "38",
         "Problem 10. Find all sequences of positive integers $\\left\\{a_{n}\\right\\}_{n=1}^{\\infty}$, such that $a_{4}=4$ and the identity $$ \\frac{1}{a_{1} a_{2} a_{3}}+\\frac{1}{a_{2} a_{3} a_{4}}+\\cdots+\\frac{1}{a_{n} a_{n+1} a_{n+2}}=\\frac{(n+3) a_{n}}{4 a_{n+1}...",
         "10. We rewrite the recurrence relation as $\\frac{(n+2) a_{n-1}}{4 a_{n} a_{n+1}}+\\frac{1}{a_{n} a_{n+1} a_{n+2}}=\\frac{(n+3) a_{n}}{4 a_{n+1} a_{n+2}} \\Longleftrightarrow(n+2) a_{n+2}=\\frac{(n+3) a_{n}^{2}-4}{a_{n-1}}$ for $n \\geq 3$. Setting $n=2$...",
         "a_{n}=n",
         "Given the sequence \\(\\{a_n\\}\\) of positive integers with \\(a_4=4\\), and the sum formula: \\[ \\frac{1}{a_1 a_2 a_3} + \\frac{1}{a_2 a_3 a_4} + \\cdots + \\frac{1}{a_n...",
         "The sequences \\(\\left\\{a_n\\right\\}\\) consist of positive integers with initial terms satisfying the relations: \\[ a_1 = \\frac{16}{5 a_2^2 - 4}, \\] where \\(a_2=1\\), resulting in...",
         ""
        ],
        [
         "39",
         "Find the area of isosceles triangle $DEF$, where sides $DE = DF = 5$ units and $\\angle D = 120^\\circ$.",
         "Since $DEF$ is an isosceles triangle with $\\angle D = 120^\\circ$: 1. Drop a perpendicular line from $D$ to base $EF$ at point $G$, splitting...",
         "\\frac{250}{9}",
         "The triangle \\( DEF \\) is isosceles with \\( DE = DF = 5 \\) units, and the angle between these equal sides, \\( \\angle...",
         "\\[\n\\boxed{\\frac{25 \\sqrt{3}}{4} \\text{ square units}}\n\\]",
         ""
        ],
        [
         "40",
         "If 70 honey bees make some grams of honey in 70 days, and 1 honey bee will make 1 gram of honey in 70 days,...",
         "If 1 honey bee makes 1 gram of honey in 70 days, then 70 honey bees would make 70 times as much honey in the...",
         "70",
         "The problem states that 1 honey bee makes 1 gram of honey in 70 days. Since 70 honey bees are working for 70 days, each...",
         "70 grams",
         "✔️ [True]"
        ],
        [
         "41",
         "3. How are the $\\binom{7}{3}=35$ vectors of weight 3 covered by the day-vectors? Seven of them are day-vectors, and the remaining 28 ones must be...",
         "Solution 2: If a dwarf $X$ performs the same type of work on three days $D_{1}, D_{2}, D_{3}$, then we say that this triple of...",
         "proof",
         "The problem discusses covering all vectors of weight 3 within a set of vectors, referred to as \"day-vectors.\" There are 35 vectors of weight 3,...",
         "The structure of the day-vectors is such that they consist of one vector of weight 0, seven vectors of weight 3, seven vectors of weight...",
         ""
        ],
        [
         "42",
         "Carly is making a beaded corset. She's going to add 50 rows of purple beads with 20 beads per row, 40 rows of blue beads...",
         "First, let's calculate the total number of purple and blue beads Carly will use. For the purple beads: 50 rows * 20 beads per row...",
         "\\$180",
         "First, calculate the total number of purple beads: 50 rows × 20 beads/row = 1,000 beads. Next, calculate the total number of blue beads: 40...",
         "80",
         ""
        ],
        [
         "43",
         "A certain bacterium splits into $2$ every $30$ minutes. How many bacteria of this kind can be produced after $3$ hours? A: $8$ B: $16$...",
         "To solve this problem, we need to understand how the bacterium multiplies over time. Given that one bacterium splits into $2$ every $30$ minutes, we...",
         "D",
         "The bacteria double every 30 minutes. Starting with one bacterium, after 30 minutes there are 2, after another 30 minutes (i.e., 1 hour in total),...",
         "D: 64",
         ""
        ],
        [
         "44",
         "79. (1) Prove that 1998 cannot be expressed as the sum of any number of consecutive odd numbers; (2) If the numbers from 1 to...",
         "Solution: (1) If it can be expressed, let $n_{0}$ be the first odd number, then $$ \\begin{array}{l} 1998= n_{0}+\\left(n_{0}+2\\right)+\\cdots+\\left(n_{0}+2 k\\right) \\\\ =\\left(n_{0}+2 k\\right)+\\left(n_{0}+2 k-2\\right)+\\cdots \\\\...",
         "586",
         "Part (1): The sum of consecutive odd numbers starting from 1 can be expressed as \\( (2k-1) + (2k+1) + \\dots \\), but a key...",
         "(1) Cannot be expressed as the sum of any number of consecutive odd numbers. (2) Cannot be achieved regardless of sign assignment. (3) Minimum of...",
         ""
        ],
        [
         "45",
         "Given the universal set $U=\\mathbb{R}$, and the set $M=\\{x|y= \\sqrt{1-x}\\}$, then the complement of $M$ in $U$, denoted as $\\complement_U M$, is ( ) A:...",
         "Since the universal set $U=\\mathbb{R}$, and the set $M=\\{x|y= \\sqrt{1-x}\\}=\\{x|x\\leq1\\}$, then $\\complement_U M=(1，+\\infty)$, thus, the correct choice is: $\\boxed{\\text{D}}$. By determining the set $M$, we...",
         "\\text{D}",
         "The set \\( M = \\{x \\mid y = \\sqrt{1 - x}\\} \\) seems to describe the set of all \\( x \\in \\mathbb{R} \\)...",
         "D: \\( (1, +\\infty) \\)",
         ""
        ],
        [
         "46",
         "Given $a, b \\in \\mathbb{R}$, then \"$0 \\leqslant a \\leqslant 1$ and $0 \\leqslant b \\leqslant 1$\" is a ( ) for \"$0 \\leqslant ab...",
         "If $0 \\leqslant a \\leqslant 1$ and $0 \\leqslant b \\leqslant 1$, then \"$0 \\leqslant ab \\leqslant 1$\" holds. If \"$0 \\leqslant ab \\leqslant 1$\",...",
         "A",
         "Given that \\( a, b \\in \\mathbb{R} \\) with \\( 0 \\leq a \\leq 1 \\) and \\( 0 \\leq b \\leq 1 \\), then...",
         "A: Sufficient but not necessary condition",
         "✔️ [True]"
        ],
        [
         "47",
         "27.1. (Belgium, 80). Each of the two urns contains white and black balls, and the total number of balls in both urns is 25. One...",
         "27.1. Let the total number of balls in the first and second urns be \\( m_{\\mathrm{i}} \\) and \\( m_{2} \\) respectively (for definiteness, assume...",
         "0.04",
         "Let the two urns be urn A and urn B. Let the number of white balls in urn A be \\( W_A \\), and black...",
         "\\(\\boxed{\\frac{1}{15}}\\)",
         ""
        ],
        [
         "48",
         "3. 26 Prove that the number of integer solutions satisfying the following conditions: $$ \\begin{array}{c} x_{1}+x_{2}+\\cdots+x_{n}=r, \\\\ 1 \\leqslant x_{n} \\leqslant k, \\quad i=1,2, \\cdots,...",
         "None Translate the text above into English, please retain the original text's line breaks and format, and output the translation result directly. Note: The provided...",
         "proof",
         "The problem asks for the number of integer solutions to the equation \\( x_1 + x_2 + \\cdots + x_n = r \\) with the...",
         "The number of integer solutions to the equation \\( x_1 + x_2 + \\cdots + x_n = r \\) with \\( 1 \\leq x_i \\leq...",
         ""
        ],
        [
         "49",
         "3. Given the equation $x^{4}-p x^{3}+q=0$ has an integer root, find the prime numbers $p$ and $q$. untranslated text: 已知方程 $x^{4}-p x^{3}+q=0$ 有一整数根,求素数 $p 、...",
         "3. Solution: Let the integer $x$ satisfy $q=x^{3}(p-x)$. Then $x \\mid q$. Since $q$ is a prime number, it can only be that $x= \\pm...",
         "p=3,q=2",
         "Let the integer root of the polynomial \\(x^4 - p x^3 + q = 0\\) be \\(r\\). Substituting \\(x = r\\) into the polynomial gives:...",
         "\\( p=3, \\quad q=2 \\)",
         ""
        ]
       ],
       "shape": {
        "columns": 6,
        "rows": 90
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>problem</th>\n",
       "      <th>solution</th>\n",
       "      <th>example_answer</th>\n",
       "      <th>reasoning</th>\n",
       "      <th>pred_answer</th>\n",
       "      <th>metric</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a cistern is two - third full of water . pipe a can fill the remai...</td>\n",
       "      <td>First, let's find out how much time it would take for each pipe to...</td>\n",
       "      <td>14.4</td>\n",
       "      <td>The cistern is initially two-thirds full, so the remaining part to...</td>\n",
       "      <td>4.8 minutes</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>In the geometric sequence $\\{a_n\\}$ with common ratio $q &gt; 1$, we ...</td>\n",
       "      <td>(1) Since $a_2 = 2$ and $S_3 = 7$ with $q &gt; 1$, We have the system...</td>\n",
       "      <td>\\frac{n}{n+1}</td>\n",
       "      <td>Given that \\(\\{a_n\\}\\) is a geometric sequence with ratio \\(q &gt; 1\\...</td>\n",
       "      <td>(1) The general term formula for the sequence \\(\\{a_n\\}\\) is \\(\\bo...</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Given that the derivative of the function $f(x)$ is $f'(x)$, and i...</td>\n",
       "      <td>Given that the derivative of the function $f(x)$ is $f'(x)$, and i...</td>\n",
       "      <td>-\\dfrac{9}{4}</td>\n",
       "      <td>The function given is \\[f(x) = x^2 + 3x f'(2) + \\ln x.\\] Note that...</td>\n",
       "      <td>\\(\\boxed{-\\frac{9}{4}}\\)</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>67. Given a point $A$ and a line $l$, $B$ is an arbitrary point on...</td>\n",
       "      <td>67. The desired geometric locus of points consists of two lines pa...</td>\n",
       "      <td>notfound</td>\n",
       "      <td>Given a fixed point \\(A\\) and a line \\(l\\), and an arbitrary point...</td>\n",
       "      <td>The locus of points \\(M\\) such that \\(ABM\\) is an equilateral tria...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A unit has a total of 620 staff members. To investigate the time w...</td>\n",
       "      <td>**Analysis** This question examines systematic sampling, which is ...</td>\n",
       "      <td>394</td>\n",
       "      <td>The total staff is 620, divided into 62 segments, which implies ea...</td>\n",
       "      <td>394</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>Darnel sprinted 0.88 lap and then took a break by jogging 0.75 lap...</td>\n",
       "      <td>To find out how many laps farther Darnel sprinted than jogged, we ...</td>\n",
       "      <td>0.13</td>\n",
       "      <td>Darnel sprinted 0.88 lap and then jogged 0.75 lap. To find how man...</td>\n",
       "      <td>Darnel sprinted 0.13 laps farther than he jogged.</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>In hexagon $FIGURE$, $\\angle F \\cong \\angle I \\cong \\angle U \\cong...</td>\n",
       "      <td>The sum of the angle measures in a hexagon is \\(180(6-2) = 720\\) d...</td>\n",
       "      <td>45^\\circ</td>\n",
       "      <td>The problem describes a hexagon named FIGURE with six vertices: F,...</td>\n",
       "      <td>30</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>A, B, C, and D enter into a partnership. A subscribes 1/3 of the c...</td>\n",
       "      <td>Let's denote the total capital as X. A subscribes 1/3 of the capit...</td>\n",
       "      <td>7/15</td>\n",
       "      <td>A's share of profit is Rs. 810 in a total profit of Rs. 2430. The ...</td>\n",
       "      <td>B subscribes to 2/15 of the capital.</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>At a laundromat, it costs a certain amount for a washer and a quar...</td>\n",
       "      <td>Let's denote the cost for a washer as \\( W \\). Samantha does 2 loa...</td>\n",
       "      <td>\\$4</td>\n",
       "      <td>Let the cost for the washer be \\( x \\) dollars. Samantha does 2 lo...</td>\n",
       "      <td>The washer costs \\(\\boxed{\\$4}\\).</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>Find the real roots of the polynomial:\\n\\[ x^5 - 3x^4 + 3x^3 - x^2...</td>\n",
       "      <td>We attempt to factor the polynomial: \\begin{align*} x^5 - 3x^4 + 3...</td>\n",
       "      <td>-1 - \\sqrt{3}, -1 + \\sqrt{3}, -1, 1, 2</td>\n",
       "      <td>The given polynomial is: \\[ x^5 - 3x^4 + 3x^3 - x^2 - 4x + 4 = 0. ...</td>\n",
       "      <td>The real roots of the polynomial are \\(\\boxed{-1, 1, 2}\\).</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>90 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                  problem  \\\n",
       "0   a cistern is two - third full of water . pipe a can fill the remai...   \n",
       "1   In the geometric sequence $\\{a_n\\}$ with common ratio $q > 1$, we ...   \n",
       "2   Given that the derivative of the function $f(x)$ is $f'(x)$, and i...   \n",
       "3   67. Given a point $A$ and a line $l$, $B$ is an arbitrary point on...   \n",
       "4   A unit has a total of 620 staff members. To investigate the time w...   \n",
       "..                                                                    ...   \n",
       "85  Darnel sprinted 0.88 lap and then took a break by jogging 0.75 lap...   \n",
       "86  In hexagon $FIGURE$, $\\angle F \\cong \\angle I \\cong \\angle U \\cong...   \n",
       "87  A, B, C, and D enter into a partnership. A subscribes 1/3 of the c...   \n",
       "88  At a laundromat, it costs a certain amount for a washer and a quar...   \n",
       "89  Find the real roots of the polynomial:\\n\\[ x^5 - 3x^4 + 3x^3 - x^2...   \n",
       "\n",
       "                                                                 solution  \\\n",
       "0   First, let's find out how much time it would take for each pipe to...   \n",
       "1   (1) Since $a_2 = 2$ and $S_3 = 7$ with $q > 1$, We have the system...   \n",
       "2   Given that the derivative of the function $f(x)$ is $f'(x)$, and i...   \n",
       "3   67. The desired geometric locus of points consists of two lines pa...   \n",
       "4   **Analysis** This question examines systematic sampling, which is ...   \n",
       "..                                                                    ...   \n",
       "85  To find out how many laps farther Darnel sprinted than jogged, we ...   \n",
       "86  The sum of the angle measures in a hexagon is \\(180(6-2) = 720\\) d...   \n",
       "87  Let's denote the total capital as X. A subscribes 1/3 of the capit...   \n",
       "88  Let's denote the cost for a washer as \\( W \\). Samantha does 2 loa...   \n",
       "89  We attempt to factor the polynomial: \\begin{align*} x^5 - 3x^4 + 3...   \n",
       "\n",
       "                            example_answer  \\\n",
       "0                                     14.4   \n",
       "1                            \\frac{n}{n+1}   \n",
       "2                            -\\dfrac{9}{4}   \n",
       "3                                 notfound   \n",
       "4                                      394   \n",
       "..                                     ...   \n",
       "85                                    0.13   \n",
       "86                                45^\\circ   \n",
       "87                                    7/15   \n",
       "88                                     \\$4   \n",
       "89  -1 - \\sqrt{3}, -1 + \\sqrt{3}, -1, 1, 2   \n",
       "\n",
       "                                                                reasoning  \\\n",
       "0   The cistern is initially two-thirds full, so the remaining part to...   \n",
       "1   Given that \\(\\{a_n\\}\\) is a geometric sequence with ratio \\(q > 1\\...   \n",
       "2   The function given is \\[f(x) = x^2 + 3x f'(2) + \\ln x.\\] Note that...   \n",
       "3   Given a fixed point \\(A\\) and a line \\(l\\), and an arbitrary point...   \n",
       "4   The total staff is 620, divided into 62 segments, which implies ea...   \n",
       "..                                                                    ...   \n",
       "85  Darnel sprinted 0.88 lap and then jogged 0.75 lap. To find how man...   \n",
       "86  The problem describes a hexagon named FIGURE with six vertices: F,...   \n",
       "87  A's share of profit is Rs. 810 in a total profit of Rs. 2430. The ...   \n",
       "88  Let the cost for the washer be \\( x \\) dollars. Samantha does 2 lo...   \n",
       "89  The given polynomial is: \\[ x^5 - 3x^4 + 3x^3 - x^2 - 4x + 4 = 0. ...   \n",
       "\n",
       "                                                              pred_answer  \\\n",
       "0                                                             4.8 minutes   \n",
       "1   (1) The general term formula for the sequence \\(\\{a_n\\}\\) is \\(\\bo...   \n",
       "2                                                \\(\\boxed{-\\frac{9}{4}}\\)   \n",
       "3   The locus of points \\(M\\) such that \\(ABM\\) is an equilateral tria...   \n",
       "4                                                                     394   \n",
       "..                                                                    ...   \n",
       "85                      Darnel sprinted 0.13 laps farther than he jogged.   \n",
       "86                                                                     30   \n",
       "87                                   B subscribes to 2/15 of the capital.   \n",
       "88                                      The washer costs \\(\\boxed{\\$4}\\).   \n",
       "89             The real roots of the polynomial are \\(\\boxed{-1, 1, 2}\\).   \n",
       "\n",
       "       metric  \n",
       "0              \n",
       "1   ✔️ [True]  \n",
       "2   ✔️ [True]  \n",
       "3              \n",
       "4   ✔️ [True]  \n",
       "..        ...  \n",
       "85  ✔️ [True]  \n",
       "86             \n",
       "87             \n",
       "88  ✔️ [True]  \n",
       "89  ✔️ [True]  \n",
       "\n",
       "[90 rows x 6 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "EvaluationResult(score=52.22, results=<list of 90 results>)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "evaluate = dspy.Evaluate(\n",
    "    devset=test_set,\n",
    "    metric=metric,\n",
    "    num_threads=16,\n",
    "    display_table=True,\n",
    "    display_progress=True\n",
    ")\n",
    "\n",
    "evaluate(program)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dm4dzonddq",
   "metadata": {},
   "source": [
    "### Understanding the Baseline Results\n",
    "\n",
    "The evaluation table shows our model's performance on 90 test problems:\n",
    "\n",
    "**Table Columns:**\n",
    "- `problem`: The mathematical question from NuminaMath-1.5\n",
    "- `example_answer`: Ground truth answer\n",
    "- `reasoning`: Model's chain-of-thought reasoning process\n",
    "- `pred_answer`: Model's final prediction\n",
    "- `metric`: ✔️ indicates correct answer\n",
    "\n",
    "**Key Observations:**\n",
    "- **Baseline Accuracy: ~52%** - The model gets roughly half the problems correct\n",
    "- **Reasoning Quality**: The model generates coherent step-by-step reasoning (see the `reasoning` column)\n",
    "- **Common Failures**: \n",
    "  - Calculation errors (e.g., row 0: predicted 4.8 minutes vs correct 14.4 minutes)\n",
    "  - Misinterpreting problem statements\n",
    "\n",
    "**Why This Matters:**\n",
    "This baseline performance demonstrates that while GPT-4.1 Nano has reasonable mathematical reasoning capability, there's significant room for improvement. GEPA will analyze these errors and automatically refine the prompt to address common failure patterns, potentially boosting accuracy by 10-20 percentage points."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff1c4d03",
   "metadata": {},
   "source": [
    "## GEPA Optimization\n",
    "\n",
    "Apply GEPA optimizer with error-driven feedback to automatically improve the prompt and boost performance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "nu6vhs2vzgq",
   "metadata": {},
   "source": [
    "### How GEPA Works: Error-Driven Prompt Improvement\n",
    "\n",
    "GEPA (Generalized Error-driven Prompt Augmentation) is an automatic prompt optimization technique that learns from mistakes to improve model performance. Here's how it works:\n",
    "\n",
    "**The GEPA Optimization Cycle:**\n",
    "\n",
    "1. **Evaluation Phase** - Run the model on training examples and collect predictions\n",
    "2. **Error Analysis** - Identify which problems the model got wrong\n",
    "3. **Feedback Generation** - Create detailed feedback explaining:\n",
    "   - What the correct answer should be\n",
    "   - Why the model's answer was wrong\n",
    "   - The complete step-by-step solution\n",
    "4. **Reflection Phase** - Use the reflection LM (Qwen3 Thinking) to:\n",
    "   - Analyze patterns across multiple failed examples\n",
    "   - Identify common failure modes (e.g., \"model miscalculates ratios\", \"model misinterprets word problems\")\n",
    "   - Generate improved prompt instructions to address these patterns\n",
    "5. **Prompt Update** - Modify the system prompt with new guidelines\n",
    "6. **Validation** - Test the updated prompt on validation set\n",
    "7. **Iteration** - Repeat the cycle, keeping only improvements that boost validation accuracy\n",
    "\n",
    "**Why We Need `metric_with_feedback`:**\n",
    "\n",
    "Unlike a standard metric that just returns 0 or 1 (correct/incorrect), `metric_with_feedback` returns:\n",
    "- **Score**: 0 or 1 for correctness\n",
    "- **Feedback**: Rich textual explanation including the ground truth solution\n",
    "\n",
    "This feedback is crucial because GEPA's reflection model needs to understand *why* predictions failed to generate better prompts. The more detailed the feedback, the better GEPA can identify patterns and create targeted improvements.\n",
    "\n",
    "**Key Parameters:**\n",
    "- `auto=\"light\"`: Controls optimization intensity (light/medium/heavy)\n",
    "- `reflection_minibatch_size=16`: Number of errors analyzed together per reflection\n",
    "- `reflection_lm`: The smarter model used for analyzing errors and improving prompts\n",
    "- `num_threads=32`: Parallel evaluation for faster optimization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "74188b9e",
   "metadata": {},
   "outputs": [],
   "source": [
    "def metric_with_feedback(\n",
    "    example: dspy.Example, \n",
    "    prediction: dspy.Prediction, \n",
    "    trace=None, \n",
    "    pred_name=None, \n",
    "    pred_trace=None\n",
    ") -> dspy.Prediction:\n",
    "    \"\"\"\n",
    "    Enhanced evaluation metric with detailed feedback for GEPA optimization.\n",
    "    \n",
    "    Evaluates predictions and generates targeted feedback including error analysis\n",
    "    and the complete solution for learning. Feedback helps GEPA identify failure\n",
    "    patterns and improve prompts.\n",
    "    \n",
    "    Args:\n",
    "        example: DSPy Example with ground truth answer and solution\n",
    "        prediction: DSPy Prediction with model's answer\n",
    "        trace: Optional trace information (unused)\n",
    "        pred_name: Optional prediction name (unused)\n",
    "        pred_trace: Optional prediction trace (unused)\n",
    "    \n",
    "    Returns:\n",
    "        DSPy Prediction with score (0 or 1) and detailed feedback text\n",
    "    \"\"\"\n",
    "    # Extract ground truth and solution\n",
    "    written_solution = example.get('solution', '')\n",
    "    \n",
    "    try:\n",
    "        llm_answer = prediction\n",
    "    except ValueError as e:\n",
    "        # Handle parsing failure with detailed feedback\n",
    "        feedback_text = (\n",
    "            f\"The final answer must be a valid integer and nothing else. \"\n",
    "            f\"You responded with '{prediction.answer}', which couldn't be parsed as a python integer. \"\n",
    "            f\"Please ensure your answer is a valid integer without any additional text or formatting.\"\n",
    "        )\n",
    "        feedback_text += f\" The correct answer is '{example.get('answer', '')}'.\"\n",
    "        \n",
    "        # Include full solution if available\n",
    "        if written_solution:\n",
    "            feedback_text += (\n",
    "                f\" Here's the full step-by-step solution:\\n{written_solution}\\n\\n\"\n",
    "                f\"Think about what takeaways you can learn from this solution to improve \"\n",
    "                f\"your future answers and approach to similar problems and ensure your \"\n",
    "                f\"final answer is a valid integer.\"\n",
    "            )\n",
    "        return dspy.Prediction(score=0, feedback=feedback_text)\n",
    "\n",
    "    # Score: 1 for correct, 0 for incorrect\n",
    "    score = metric(example, llm_answer)\n",
    "\n",
    "    # Generate appropriate feedback based on correctness\n",
    "    feedback_text = \"\"\n",
    "    if score == 1:\n",
    "        feedback_text = f\"Your answer is correct. The correct answer is '{example.get('answer', '')}'.\"\n",
    "    else:\n",
    "        feedback_text = f\"Your answer is incorrect. The correct answer is '{example.get('answer', '')}'.\"\n",
    "\n",
    "    # Append complete solution for learning\n",
    "    if written_solution:\n",
    "        feedback_text += (\n",
    "            f\" Here's the full step-by-step solution:\\n{written_solution}\\n\\n\"\n",
    "            f\"Think about what takeaways you can learn from this solution to improve \"\n",
    "            f\"your future answers and approach to similar problems.\"\n",
    "        )\n",
    "\n",
    "    return dspy.Prediction(score=score, feedback=feedback_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "474cbf4b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from dspy import GEPA\n",
    "\n",
    "optimizer = GEPA(\n",
    "    metric=metric_with_feedback,\n",
    "    auto=\"light\",\n",
    "    num_threads=32,\n",
    "    track_stats=True,\n",
    "    reflection_minibatch_size=16,\n",
    "    track_best_outputs=True,\n",
    "    add_format_failure_as_feedback=True,\n",
    "    reflection_lm=reflection_lm,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "428f7e36",
   "metadata": {},
   "outputs": [],
   "source": [
    "optimized_program = optimizer.compile(\n",
    "    program,\n",
    "    trainset=train_set,\n",
    "    valset=val_set,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "3bdaf95c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "text\n",
      "Solve the problem step-by-step, following these guidelines:\n",
      "\n",
      "- Carefully read the problem statement to understand all provided data and conditions explicitly.\n",
      "- Define all variables and parameters clearly at the beginning.\n",
      "- For geometry problems:\n",
      "  - Confirm exact shape properties (e.g., isosceles triangle has two equal sides; quadratic equation solutions may form sides where two sides equal one root value and the third side is the other root).\n",
      "  - Apply correct formulas (e.g., circumradius R = abc/(4Δ) or precise isosceles triangle formulas) and verify triangle inequalities (sum of any two sides > third side).\n",
      "- For word problems:\n",
      "  - Correctly interpret phrases (e.g., \"A beats B by 200 meters\" means when A finishes the race, B has run 800 meters).\n",
      "  - For gradual change problems (fleets, age, etc.), track each year/item step-by-step with clear calculations.\n",
      "- For functional equations with recurrences (e.g., f(x) + f(x+1) = 1):\n",
      "  - Break domain into intervals based on integer/fractional parts.\n",
      "  - Apply recurrence relations correctly to express unknowns in terms of known intervals.\n",
      "- For derivative problems:\n",
      "  - Differentiate terms precisely (treat f'(c) as a constant for fixed c).\n",
      "  - Solve equations step by step, including substitution of specific values at the correct stage.\n",
      "- For proofs/identities:\n",
      "  - Simplify algebraically or trigonometrically using standard identities.\n",
      "  - Check key steps (e.g., gcd analysis for integer problems, factorization, modular arithmetic).\n",
      "- For multiple-choice questions:\n",
      "  - Select the correct option letter (e.g., \\boxed{\\text{C}}) after verification.\n",
      "- Always verify results against all problem constraints (e.g., domain restrictions, integer requirements, physical feasibility).\n",
      "- Present the final answer strictly inside \\boxed{} with the required format (numerical value, expression, or option letter).\n",
      "- Never make unwarranted assumptions (e.g., assuming equilateral triangle when only isosceles is specified).\n"
     ]
    }
   ],
   "source": [
    "print(optimized_program.predict.signature.instructions)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74c7476f",
   "metadata": {},
   "source": [
    "## Optimized Program Evaluation\n",
    "\n",
    "Evaluate the GEPA-optimized program to measure the improvement in accuracy and effectiveness."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "02a924dc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average Metric: 52.00 / 90 (57.8%): 100%|██████████| 90/90 [01:13<00:00,  1.23it/s]"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025/10/04 20:48:20 INFO dspy.evaluate.evaluate: Average Metric: 52 / 90 (57.8%)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "problem",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "solution",
         "rawType": "object",
         "type": "unknown"
        },
        {
         "name": "example_answer",
         "rawType": "object",
         "type": "unknown"
        },
        {
         "name": "reasoning",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "pred_answer",
         "rawType": "object",
         "type": "string"
        },
        {
         "name": "metric",
         "rawType": "object",
         "type": "string"
        }
       ],
       "ref": "09baa805-ede3-4f05-b35f-17cfe18d7af2",
       "rows": [
        [
         "0",
         "a cistern is two - third full of water . pipe a can fill the remaining part in 12 minutes and pipe b in 8...",
         "First, let's find out how much time it would take for each pipe to fill the cistern completely if it were empty. If Pipe A...",
         "14.4",
         "Given: - Cistern is two-thirds full initially. - Pipe A can fill the remaining one-third in 12 minutes. - Pipe B can fill the remaining...",
         "\\boxed{4.8 \\text{ minutes}} \nor approximately 4 minutes and 48 seconds.",
         ""
        ],
        [
         "1",
         "In the geometric sequence $\\{a_n\\}$ with common ratio $q > 1$, we have $a_2 = 2$ and the sum of the first three terms $S_3...",
         "(1) Since $a_2 = 2$ and $S_3 = 7$ with $q > 1$, We have the system of equations: $\\begin{cases} a_1q = 2 \\\\ a_1(1...",
         "\\frac{n}{n+1}",
         "Given the geometric sequence \\(\\{a_n\\}\\) with common ratio \\(q > 1\\): - \\(a_2 = 2\\) and the sum of the first three terms \\(S_3 =...",
         "(1) \\boxed{a_n = 2^{n-1}} \\; \\\\\n(2) \\boxed{T_n = \\frac{1}{2} - \\frac{1}{n+2}}",
         ""
        ],
        [
         "2",
         "Given that the derivative of the function $f(x)$ is $f'(x)$, and it satisfies the relationship $f(x) = x^{2} + 3xf'(2) + \\ln x$, find the...",
         "Given that the derivative of the function $f(x)$ is $f'(x)$, and it satisfies the relationship $f(x) = x^{2} + 3xf'(2) + \\ln x$, we want...",
         "-\\dfrac{9}{4}",
         "We are given that \\(f(x) = x^2 + 3x f'(2) + \\ln x\\). To find \\(f'(2)\\), which is a constant, we need to differentiate \\(f(x)\\)...",
         "\\boxed{-\\frac{9}{4}}",
         "✔️ [True]"
        ],
        [
         "3",
         "67. Given a point $A$ and a line $l$, $B$ is an arbitrary point on $l$. Find the geometric locus of points $M$ such that...",
         "67. The desired geometric locus of points consists of two lines passing through the point symmetric to point $A$ with respect to line $l$, and...",
         "notfound",
         "Given a fixed point \\(A\\) and a line \\(l\\), and an arbitrary point \\(B\\) on line \\(l\\), the problem is to find the geometric locus...",
         "The locus of points \\(M\\) is the two lines passing through \\(A\\) that form angles of \\(60^\\circ\\) with line \\(l\\), symmetric with respect to line...",
         ""
        ],
        [
         "4",
         "A unit has a total of 620 staff members. To investigate the time workers spend commuting, it was decided to survey 62 workers using a...",
         "**Analysis** This question examines systematic sampling, which is key to understanding that systematic sampling involves sampling at equal intervals, forming an arithmetic sequence. This is...",
         "394",
         "The total number of staff members is 620. The staff was divided into 62 segments, which means each segment contains \\(\\frac{620}{62} = 10\\) workers. The...",
         "394",
         "✔️ [True]"
        ],
        [
         "5",
         "6. Given a prime $p \\equiv 3(\\bmod 4)$. For an integer sequence of length no more than $p-1$ composed of $\\pm 1, \\pm 2, \\cdots,...",
         "6. For $1 \\leqslant k \\leqslant \\frac{p-1}{2}$, the number of balanced sequences with $2k$ terms is $$ \\begin{array}{l} \\mathrm{C}_{2 k}^{k}\\left(\\frac{p-1}{2}\\right)^{k}\\left(\\frac{p-1}{2}\\right)^{k}=\\mathrm{C}_{2 k}^{k}\\left(\\frac{p-1}{2}\\right)^{2 k} . \\\\ \\text...",
         "proof",
         "Let's begin by analyzing the problem and the concepts involved. Given a prime \\( p \\equiv 3 \\pmod{4}\\), we have: - The set of possible...",
         "\\boxed{ \\text{For a prime } p \\equiv 3 \\pmod{4}, \\text{ the number } M_p \\text{, representing the count of balanced sequences, is not a perfect...",
         ""
        ],
        [
         "6",
         "Given the function $f(x)=a\\ln x+ \\frac {1}{2}x^{2}+(a+1)x+1$. $(1)$ When $a=-1$, find the interval of monotonic increase for the function $f(x)$; $(2)$ If the function $f(x)$...",
         "Solution: $(1)$ When $a=-1$, $f(x)=-\\ln x+ \\frac {1}{2}x^{2}+1$. Then $f′(x)=- \\frac {1}{x}+x$. Let $f′(x) > 0$, we get $- \\frac {1}{x}+x > 0$, which is...",
         "3-2 \\sqrt {2}",
         "Given the function \\( f(x) = a \\ln x + \\frac{1}{2} x^2 + (a+1)x + 1 \\). 1. To analyze monotonicity, we differentiate \\( f(x)...",
         "(1) When \\( a = -1 \\), \\( f(x) \\) is increasing on \\( (1, +\\infty) \\). (2) \\( f(x) \\) is increasing on \\(...",
         "✔️ [True]"
        ],
        [
         "7",
         "On a line, there are 2010 points. We perform the following operation: insert one point between every two adjacent points. After conducting this operation 3...",
         "Initially, with 2010 points on the line, there are 2009 segments between these points. After the first operation, we insert 2009 new points, resulting in...",
         "16073",
         "Initially, there are 2010 points on the line. The operation described is to insert one point between every two adjacent points. At each step: -...",
         "\\boxed{16073}",
         "✔️ [True]"
        ],
        [
         "8",
         "Jackie loves to climb trees. She climbed a 1000 foot tall tree. Then she climbed 2 trees that were a certain fraction of the height...",
         "Let's denote the height of the first tree as \\( H_1 \\) and the height of each of the two trees as \\( H_2 \\)....",
         "1:2",
         "Let the height of Jackie's first tree be \\(H = 1000\\) feet. She then climbs two more trees, each a certain fraction \\(f\\) of the...",
         "\\boxed{\\frac{1}{2}}",
         "✔️ [True]"
        ],
        [
         "9",
         "From a group of eight students, how many different five-student committees can be chosen?",
         "We use the combination formula to solve this problem. The formula to calculate the number of ways to choose $k$ students from a group of...",
         "56",
         "The problem is asking for the number of ways to select a 5-student committee from a group of 8 students. This is a combination problem...",
         "\\boxed{56}",
         "✔️ [True]"
        ],
        [
         "10",
         "In $\\triangle ABC$, $BC= a$, $AC= b$, $AB = c$, ${{a}^{2}} -{{c}^{2}} = {{b}^{2}} - bc$, find the angle $A$.",
         "[Analysis] The main focus of this problem is to apply the cosine rule. [Solution] We have $\\cos A= \\frac{{b}^{2}+{c}^{2}-{a}^{2}}{2bc}= \\frac{bc}{2bc}= \\frac{1}{2}$ (using the provided equation...",
         "\\frac{π}{3}",
         "Given the triangle \\( \\triangle ABC \\) with sides \\( BC = a \\), \\( AC = b \\), and \\( AB = c \\)....",
         "\\boxed{60^\\circ}",
         ""
        ],
        [
         "11",
         "In the rectangular coordinate system, a polar coordinate system is established with the origin as the pole and the positive semi-axis of the $x$-axis as...",
         "(1) From ${\\cos }^{2}\\alpha+{\\sin }^{2}\\alpha=1$ we obtain the general equation of curve $C$ as: $$\\frac{y^{2}}{9}+ \\frac{x^{2}}{4}=1$$ Using $x=\\rho\\cos \\theta$ and $y=\\rho\\sin \\theta$ we find the...",
         "\\frac{27}{2}",
         "First, I will convert the given polar equation of line l into rectangular coordinates. The equation is ho\\cos heta - 2 ho\\sin heta - 4...",
         "20",
         ""
        ],
        [
         "12",
         "I randomly pick an integer $p$ between $1$ and $15$ inclusive. What is the probability that I choose a $p$ such that there exists an...",
         "First, let's solve the equation $pq - 6p - 3q = 3$ by manipulating it similarly to the original problem: 1. Add 18 to both...",
         "\\frac{4}{15}",
         "We are looking for integers \\(p\\) between 1 and 15 inclusive for which there exists an integer \\(q\\) satisfying the equation: \\[ pq - 6p...",
         "\\frac{4}{15}",
         "✔️ [True]"
        ],
        [
         "13",
         "$29 \\cdot 5$ is formed by the consecutive two-digit integers from 19 to 92 arranged in sequence to form the integer $N=$ $19202122 \\cdots 909192$....",
         "[Solution] A natural number $n$ is divisible by 3 or 9 $\\Leftrightarrow$ the sum of its digits is divisible by 3 or 9. Consider $$...",
         "1",
         "The number \\( N \\) is formed by concatenating the two-digit integers from 19 to 92: \\[ N = 19202122 \\cdots 909192 \\] We want...",
         "1",
         "✔️ [True]"
        ],
        [
         "14",
         "Given an ellipse equation $\\Gamma$: $\\frac{x^2}{a^2}+\\frac{y^2}{b^2}=1\\left(a \\gt b \\gt 0\\right)$, a line $l: x+y-4\\sqrt{2}=0$, the lower end points of $\\Gamma$ are $A$, $M$ is on...",
         "### Solution: #### Part (1) Given $a=2$, we can find the semi-minor axis $b$ and the linear eccentricity $c$ as follows: - Since $a^2 =...",
         "d_{\\min} = \\frac{8}{3}",
         "Let's analyze each part step-by-step. Given: - Ellipse: \\(\\frac{x^2}{a^2} + \\frac{y^2}{b^2} = 1\\), with \\(a > b > 0\\). - Foci: \\(F_1(-\\sqrt{2}, 0)\\), \\(F_2(\\sqrt{2}, 0)\\)...",
         "(1) \\(\\boxed{M = (0, 4\\sqrt{2})}\\)\n\n(2) \\(\\boxed{b=4\\sqrt{2}}\\)\n\n(3) \\(\\boxed{d=0}\\)",
         ""
        ],
        [
         "15",
         "Given the derivative of the function $f(x)$ is $f'(x)$, and it satisfies $f(x)=2xf'(1)+x^{2}$, then $f'(1)=$　　()\nA: $-1$\nB: $-2$\nC: $1$\nD: $2$",
         "Solution: $f'(x)=2f'(1)+2x$, By setting $x=1$, we get $f'(1)=2f'(1)+2$, $\\therefore f'(1)=-2$, Therefore, the correct answer is $\\boxed{\\text{B}}$. By using the rules of differentiation, we find $f'(x)$,...",
         "\\text{B}",
         "Given the function \\(f(x)\\) and its derivative \\(f'(x)\\), with the relation \\(f(x) = 2x f'(1) + x^2\\). To find \\(f'(1)\\), we differentiate both sides with...",
         "\\boxed{-2}",
         ""
        ],
        [
         "16",
         "If the function $f(x) = 2x^2 - \\ln x$ is not monotonous in a subinterval $(k - 1, k + 1)$ of its domain, then...",
         "1. First, let's find the domain of the function $f(x) = 2x^2 - \\ln x$. Since the natural logarithm $\\ln x$ is only defined for...",
         "1 < k < \\frac{3}{2}",
         "To determine where the function \\(f(x) = 2x^2 - \\ln x\\) is not monotonous within some subinterval \\((k - 1, k + 1)\\), we need...",
         "\\boxed{\\text{B}}",
         ""
        ],
        [
         "17",
         "Determine how many more digits does the base-4 representation of $1729_{10}$ have than the base-6 representation of $1729_{10}$?",
         "First, we compute the number of digits in the base-4 representation of $1729_{10}$: Given that $1729 1024 = 4^5$, the base-4 representation must use 6...",
         "1",
         "We need to find the number of digits in the base-4 and base-6 representations of the decimal number 1729, and then determine how many more...",
         "\\boxed{1}",
         "✔️ [True]"
        ],
        [
         "18",
         "10.",
         "A",
         "A",
         "The problem states only the number 10, without additional context or specific question. Since no explicit question or instructions are provided, the only logical response...",
         "10",
         ""
        ],
        [
         "19",
         "The chord length cut by the line $y=x$ from the circle $x^2+(y+2)^2=4$ is (　　)\n\nA: $\\sqrt{2}$\n\nB: $2\\sqrt{2}$\n\nC: $\\sqrt{3}$\n\nD: $2\\sqrt{3}$",
         "The center of the circle $x^2+(y+2)^2=4$ is at $(0, -2)$, and its radius is 2. Since the distance from the center of the circle to...",
         "\\text{B}",
         "Given the circle \\(x^2 + (y+2)^2 = 4\\), which is centered at \\((0, -2)\\) with radius \\(2\\). The line is \\(y = x\\). Substituting \\(y...",
         "\\boxed{2\\sqrt{2}}",
         ""
        ],
        [
         "20",
         "If a farmer sells ducks for $10 each and chickens for $8 each, and he sells 5 chickens along with some ducks, he then uses...",
         "Let's denote the number of ducks the farmer sold as \\( d \\). First, we calculate the total revenue from selling the chickens and ducks:...",
         "8",
         "Let's denote: - the number of ducks sold as \\(d\\), - the number of chickens sold as 5 (given). The price for each duck is...",
         "\\boxed{8}",
         "✔️ [True]"
        ],
        [
         "21",
         "(3) Six families are sequentially numbered as $1, 2, 3, 4, 5, 6$. Each family has three members, and they all gather to play a...",
         "(3) From 18 people, selecting 3 people, the number of different ways to select is $\\mathrm{C}_{18}^{3}=\\frac{18 \\times 17 \\times 16}{3 \\times 2 \\times 1}=816$. Since...",
         "\\frac{45}{68}",
         "First, note that there are 6 families, each with 3 members, so a total of 18 members. The game involves forming groups of 3 members,...",
         "\\boxed{\\frac{45}{68}}",
         "✔️ [True]"
        ],
        [
         "22",
         "A supermarket has 2355 loaves of bread at the start of the day. By mid-morning, they sell 30% of their loaf stock, and a bakery...",
         "Let's calculate the number of loaves sold by mid-morning first: 30% of 2355 loaves = 0.30 * 2355 = 706.5 loaves Since we can't have...",
         "2888",
         "Begin with the initial amount of bread: 2355 loaves. The supermarket sells 30% of this amount in the morning, so calculate 30% of 2355 and...",
         "\\boxed{2665}",
         ""
        ],
        [
         "23",
         "In the \"Love China\" high school speech competition, 6 judges gave the contestant Xiao Ming the following scores: 7, 9, 6, 7, 9, 8. What...",
         "To find the mode of the set of scores given to Xiao Ming in the \"Love China\" high school speech competition, we first list out...",
         "7 \\text{ and } 9",
         "The set of scores given to Xiao Ming are: 7, 9, 6, 7, 9, 8. To find the mode, we need to identify the value(s)...",
         "The modes are 7 and 9.",
         "✔️ [True]"
        ],
        [
         "24",
         "1.019. $\\frac{0.125: 0.25+1 \\frac{9}{16}: 2.5}{(10-22: 2.3) \\cdot 0.46+1.6}+\\left(\\frac{17}{20}+1.9\\right) \\cdot 0.5$.",
         "## Solution.\n\n$$\n\\begin{aligned}\n& \\frac{0.125: 0.25+1 \\frac{9}{16}: 2.5}{(10-22: 2.3): 0.46+1.6}+\\left(\\frac{17}{20}+1.9\\right) \\cdot 0.5=\\frac{\\frac{1}{2}+\\frac{5}{8}}{\\left(10-\\frac{220}{23}\\right) \\cdot \\frac{23}{50}+\\frac{8}{5}}+\\frac{17}{40}+\\frac{19}{20}= \\\\\n& =\\frac{\\frac{9}{8}}{\\frac{1}{5}+\\frac{8}{5}}+\\frac{17}{40}+\\frac{38}{40}=\\frac{5}{8}+\\frac{11}{8}=2\n\\end{aligned}\n$$\n\nAnswer: 2.",
         "2",
         "Let's carefully evaluate the given expression: \\[ \\frac{0.125 : 0.25 + 1 \\frac{9}{16} : 2.5}{(10 - 22 : 2.3) \\cdot 0.46 + 1.6} + \\left(\\frac{17}{20}...",
         "\\boxed{2}",
         "✔️ [True]"
        ],
        [
         "25",
         "If $f(x)=e^{-x}(\\cos x+\\sin x)$, then $f′(x)=$ ______.",
         "According to the problem, $f(x)=e^{-x}(\\cos x+\\sin x)= \\frac {\\cos x+\\sin x}{e^{x}}$, $f′(x)= \\frac {(\\cos x+\\sin x)′e^{x}-(\\cos x+\\sin x)\\cdot (e^{x})′}{e^{2x}}= \\frac {-2\\sin x}{e^{x}}=-2e^{-x}\\sin x$, The answer...",
         "-2e^{-x}\\sin x",
         "Given the function \\(f(x) = e^{-x} (\\cos x + \\sin x)\\), we need to find its derivative \\(f'(x)\\). This is a product of two functions:...",
         "\\boxed{-2 e^{-x} \\sin x}",
         "✔️ [True]"
        ],
        [
         "26",
         "Joann rode her bike at an average speed of 16 miles per hour for three and a half hours. If her friend, Fran, rides for...",
         "First, let's calculate how far Joann traveled in her time biking: \\( \\text{Distance travelled by Joann} = 3.5 \\text{ hours} \\times 16 \\text{ mph} =...",
         "14",
         "First, find the total distance Joann traveled. She rode at an average speed of 16 miles per hour for 3.5 hours. Distance = speed ×...",
         "\\boxed{14}",
         "✔️ [True]"
        ],
        [
         "27",
         "Let $a$ and $b$ be the roots of the equation $x^2 - mx + 3 = 0$. Suppose that $a + \\frac{1}{b}$ and $b +...",
         "From Vieta's formulas for the first equation, we know $ab = 3$. Applying the new roots to the second equation, we need to find: \\[r...",
         "\\frac{16}{3}",
         "Given the quadratic \\( x^2 - m x + 3 = 0 \\), its roots are \\( a \\) and \\( b \\). From Vieta's...",
         "\\boxed{\\frac{16}{3}}",
         "✔️ [True]"
        ],
        [
         "28",
         "Given a triangle $ABC$, in which the medians $BE$ and $CF$ are perpendicular. Let $M$ is the intersection point of the medians of this triangle,...",
         "1. **Define Points and Properties:** - Let \\( D \\) be the midpoint of \\( BC \\). - Let \\( E \\) be the intersection...",
         null,
         "Given triangle \\(ABC\\), medians \\(BE\\) and \\(CF\\) are perpendicular, with \\(E, F\\) midpoints of sides \\(AC\\) and \\(AB\\), respectively. \\(M\\) is the centroid, which is...",
         "\\boxed{\\text{The line } ML \\text{ is perpendicular to } BC.}",
         "✔️ [True]"
        ],
        [
         "29",
         "The cave is some feet deep and they are already at 849 feet. If they are travelling at speed of 17, they have 369 feet...",
         "If they are already at 849 feet and they have 369 feet farther to go until they reach the end of the cave, we can...",
         "1218",
         "Given variables: - Distance already traveled within the cave: 849 feet. - Remaining distance to reach the end of the cave: 369 feet. - Speed:...",
         "\\boxed{1218}",
         "✔️ [True]"
        ],
        [
         "30",
         "Pentagon ABCDE has a vertical line of symmetry. What is the $y$-coordinate of vertex C so that the area of the pentagon is 50 square...",
         "1. Calculate the area of square $ABDE$. Since each side of the square is $5$ units, its area is $5^2 = 25$ square units. 2....",
         "15",
         "Given the problem, the pentagon ABCDE has a vertical line of symmetry. The points A(0,0), B(0,5), D(5,5), and E(5,0) are fixed, forming the base and...",
         "\\boxed{20}",
         ""
        ],
        [
         "31",
         "Let $n \\ge 5$ be an integer and let $T = \\{5,6,7,\\ldots,n\\}$. Find the smallest value of $n$ such that for every partition of $T$...",
         "We claim that $625$ is the minimal value of $n$. Let the two partitioned sets be $A$ and $B$; we will try to partition $5,...",
         "625",
         "We are given a set \\( T = \\{5, 6, 7, \\ldots, n\\} \\) with \\( n \\ge 5 \\). We seek the smallest \\(...",
         "11",
         ""
        ],
        [
         "32",
         "Shapovalov A.V. The Small and Large Islands have a rectangular shape and are divided into rectangular counties. In each county, a road is laid along...",
         "The image provides an example for nine counties.\n\n![](https://cdn.mathpix.com/cropped/2024_05_06_62351a5cb646b08ac6fbg-03.jpg?height=540&width=612&top_left_y=-1&top_left_x=729)\n\nSend a comment",
         "9",
         "The problem involves two islands, each divided into rectangular counties with roads along diagonals forming closed paths that do not pass through any point more...",
         "7",
         ""
        ],
        [
         "33",
         "James takes 5 oranges and breaks each orange into 8 pieces. He splits the pieces between 4 people. If an orange has 80 calories, how...",
         "First, let's find out how many pieces of orange James has after breaking each one into 8 pieces. Since he has 5 oranges and breaks...",
         "100",
         "James has 5 oranges, each containing 80 calories, so the total calories are 5 * 80 = 400 calories. He breaks each orange into 8...",
         "\\boxed{100}",
         "✔️ [True]"
        ],
        [
         "34",
         "Example 2 As shown in Figure 2, $AC$ and $CE$ are two diagonals of the regular hexagon $ABCDEF$. Points $M$ and $N$ internally divide $AC$...",
         "Solution: Let $\\overrightarrow{A C}=2 \\vec{a}, \\overrightarrow{A F}=2 \\vec{b}$, then $$ \\begin{array}{l} \\overrightarrow{A M}=r \\overrightarrow{A C}=2 r \\vec{a}, \\overrightarrow{A B}=\\vec{a}-\\vec{b} \\text {. } \\\\ \\text {...",
         "\\frac{\\sqrt{3}}{3}",
         "Given a regular hexagon \\( ABCDEF \\), with \\( AC \\) and \\( CE \\) as diagonals. In a regular hexagon, each side and diagonal...",
         "\\(\\boxed{\\frac{1}{\\sqrt{3}}}\\)",
         ""
        ],
        [
         "35",
         "in a group of 8 engineers , 3 engineers have a phd degree ; the others have only an ms or bs degree . a...",
         "To solve this problem, we can use combinations. We want to form teams of 3 engineers that include at least one engineer with a PhD....",
         "46",
         "Total engineers: 8 Engineers with PhD: 3 Engineers without PhD: 5 Number of ways to choose a team of 3 engineers with at least 1...",
         "\\boxed{46}",
         "✔️ [True]"
        ],
        [
         "36",
         "B2 Janez has built a fence around a rectangular garden that is fenced on three sides. The length of the fence is $60 \\mathrm{~m}$, and...",
         "B2. From the given data, we write the relations $2a + b = 60$ and $a \\cdot b = 352$. We solve the resulting system...",
         "22",
         "Let the length of the garden be \\( a \\) meters and the width be \\( b \\) meters, with the condition \\( a >...",
         "\\boxed{\\text{Sides: } 44\\, \\mathrm{m} \\text{ and } 8\\, \\mathrm{m}}",
         ""
        ],
        [
         "37",
         "A director first selects 2 leading actors from a list of 5 actors, which includes 2 actors who have won the Golden Rooster Award and...",
         "The probability is calculated as $\\frac{2}{5} \\times \\frac{1}{4} \\times 3 = \\boxed{\\frac{3}{10}}$.",
         "\\frac{3}{10}",
         "The problem involves selecting actors with specific award types and calculating a probability. Total actors: 5 - 2 Golden Rooster Award winners (G actors) -...",
         "D",
         ""
        ],
        [
         "38",
         "Problem 10. Find all sequences of positive integers $\\left\\{a_{n}\\right\\}_{n=1}^{\\infty}$, such that $a_{4}=4$ and the identity $$ \\frac{1}{a_{1} a_{2} a_{3}}+\\frac{1}{a_{2} a_{3} a_{4}}+\\cdots+\\frac{1}{a_{n} a_{n+1} a_{n+2}}=\\frac{(n+3) a_{n}}{4 a_{n+1}...",
         "10. We rewrite the recurrence relation as $\\frac{(n+2) a_{n-1}}{4 a_{n} a_{n+1}}+\\frac{1}{a_{n} a_{n+1} a_{n+2}}=\\frac{(n+3) a_{n}}{4 a_{n+1} a_{n+2}} \\Longleftrightarrow(n+2) a_{n+2}=\\frac{(n+3) a_{n}^{2}-4}{a_{n-1}}$ for $n \\geq 3$. Setting $n=2$...",
         "a_{n}=n",
         "Given the problem, we are looking for positive integer sequences \\(\\{a_n\\}\\) satisfying that \\(a_4=4\\) and the identity \\[ \\frac{1}{a_1 a_2 a_3} + \\frac{1}{a_2 a_3 a_4}...",
         "\\boxed{a_n = n \\text{ for all } n \\geq 1}",
         ""
        ],
        [
         "39",
         "Find the area of isosceles triangle $DEF$, where sides $DE = DF = 5$ units and $\\angle D = 120^\\circ$.",
         "Since $DEF$ is an isosceles triangle with $\\angle D = 120^\\circ$: 1. Drop a perpendicular line from $D$ to base $EF$ at point $G$, splitting...",
         "\\frac{250}{9}",
         "Given an isosceles triangle \\( DEF \\) with sides \\( DE = DF = 5 \\) units and \\(\\angle D = 120^\\circ \\), identify the...",
         "\\(\\boxed{\\frac{25 \\sqrt{3}}{4}}\\)",
         ""
        ],
        [
         "40",
         "If 70 honey bees make some grams of honey in 70 days, and 1 honey bee will make 1 gram of honey in 70 days,...",
         "If 1 honey bee makes 1 gram of honey in 70 days, then 70 honey bees would make 70 times as much honey in the...",
         "70",
         "We are told that 70 honey bees produce some amount of honey in 70 days, and that 1 honey bee can produce 1 gram of...",
         "\\boxed{70}",
         "✔️ [True]"
        ],
        [
         "41",
         "3. How are the $\\binom{7}{3}=35$ vectors of weight 3 covered by the day-vectors? Seven of them are day-vectors, and the remaining 28 ones must be...",
         "Solution 2: If a dwarf $X$ performs the same type of work on three days $D_{1}, D_{2}, D_{3}$, then we say that this triple of...",
         "proof",
         "The problem discusses coverage of vectors within the set \\( V \\), focusing on vectors of specific weights (number of ones). It states that there...",
         "The key point is that the only day-vector capable of covering the eight heavy vectors (of weight 6 or 7) is the all-ones vector \\((1,1,1,1,1,1,1)\\)....",
         ""
        ],
        [
         "42",
         "Carly is making a beaded corset. She's going to add 50 rows of purple beads with 20 beads per row, 40 rows of blue beads...",
         "First, let's calculate the total number of purple and blue beads Carly will use. For the purple beads: 50 rows * 20 beads per row...",
         "\\$180",
         "First, calculate the total number of beads used in the purple and blue rows: - Purple beads: 50 rows × 20 beads/row = 1000 beads...",
         "\\boxed{80}",
         ""
        ],
        [
         "43",
         "A certain bacterium splits into $2$ every $30$ minutes. How many bacteria of this kind can be produced after $3$ hours? A: $8$ B: $16$...",
         "To solve this problem, we need to understand how the bacterium multiplies over time. Given that one bacterium splits into $2$ every $30$ minutes, we...",
         "D",
         "The bacteria split into 2 every 30 minutes, which implies exponential growth. The total time given is 3 hours, which equals 180 minutes. Since each...",
         "\\boxed{\\text{D}}",
         "✔️ [True]"
        ],
        [
         "44",
         "79. (1) Prove that 1998 cannot be expressed as the sum of any number of consecutive odd numbers; (2) If the numbers from 1 to...",
         "Solution: (1) If it can be expressed, let $n_{0}$ be the first odd number, then $$ \\begin{array}{l} 1998= n_{0}+\\left(n_{0}+2\\right)+\\cdots+\\left(n_{0}+2 k\\right) \\\\ =\\left(n_{0}+2 k\\right)+\\left(n_{0}+2 k-2\\right)+\\cdots \\\\...",
         "586",
         "Let's analyze each part of the problem separately. (1) Prove that 1998 cannot be expressed as the sum of any number of consecutive odd numbers....",
         "(1) 1998 cannot be expressed as the sum of any number of consecutive odd numbers. (2) The sum cannot be 1998, regardless of sign assignment,...",
         "✔️ [True]"
        ],
        [
         "45",
         "Given the universal set $U=\\mathbb{R}$, and the set $M=\\{x|y= \\sqrt{1-x}\\}$, then the complement of $M$ in $U$, denoted as $\\complement_U M$, is ( ) A:...",
         "Since the universal set $U=\\mathbb{R}$, and the set $M=\\{x|y= \\sqrt{1-x}\\}=\\{x|x\\leq1\\}$, then $\\complement_U M=(1，+\\infty)$, thus, the correct choice is: $\\boxed{\\text{D}}$. By determining the set $M$, we...",
         "\\text{D}",
         "The set \\( M = \\{ x \\mid y = \\sqrt{1 - x} \\} \\) describes a relation between \\( x \\) and \\( y...",
         "\\boxed{\\text{D}}",
         "✔️ [True]"
        ],
        [
         "46",
         "Given $a, b \\in \\mathbb{R}$, then \"$0 \\leqslant a \\leqslant 1$ and $0 \\leqslant b \\leqslant 1$\" is a ( ) for \"$0 \\leqslant ab...",
         "If $0 \\leqslant a \\leqslant 1$ and $0 \\leqslant b \\leqslant 1$, then \"$0 \\leqslant ab \\leqslant 1$\" holds. If \"$0 \\leqslant ab \\leqslant 1$\",...",
         "A",
         "The statement \"$0 \\leq a \\leq 1$ and $0 \\leq b \\leq 1$\" implies that \\(a\\) and \\(b\\) are both within the interval \\([0,1]\\). Consequently,...",
         "\\boxed{\\text{A}}",
         "✔️ [True]"
        ],
        [
         "47",
         "27.1. (Belgium, 80). Each of the two urns contains white and black balls, and the total number of balls in both urns is 25. One...",
         "27.1. Let the total number of balls in the first and second urns be \\( m_{\\mathrm{i}} \\) and \\( m_{2} \\) respectively (for definiteness, assume...",
         "0.04",
         "Let the two urns be urn A and urn B. Denote by \\(a_1\\) and \\(a_2\\) the number of white and black balls in urn A,...",
         "0.04",
         "✔️ [True]"
        ],
        [
         "48",
         "3. 26 Prove that the number of integer solutions satisfying the following conditions: $$ \\begin{array}{c} x_{1}+x_{2}+\\cdots+x_{n}=r, \\\\ 1 \\leqslant x_{n} \\leqslant k, \\quad i=1,2, \\cdots,...",
         "None Translate the text above into English, please retain the original text's line breaks and format, and output the translation result directly. Note: The provided...",
         "proof",
         "The problem is asking for the number of integer solutions to the system: \\[ x_1 + x_2 + \\cdots + x_n = r \\] with...",
         "\\[\n\\boxed{\n\\sum_{i=0}^n (-1)^i \\binom{n}{i} \\binom{r - i(k+1)}{n-1}\n}\n\\]",
         ""
        ],
        [
         "49",
         "3. Given the equation $x^{4}-p x^{3}+q=0$ has an integer root, find the prime numbers $p$ and $q$. untranslated text: 已知方程 $x^{4}-p x^{3}+q=0$ 有一整数根,求素数 $p 、...",
         "3. Solution: Let the integer $x$ satisfy $q=x^{3}(p-x)$. Then $x \\mid q$. Since $q$ is a prime number, it can only be that $x= \\pm...",
         "p=3,q=2",
         "Let the integer root of the equation \\(x^4 - p x^3 + q = 0\\) be \\(a\\), where \\(a \\in \\mathbb{Z}\\). Substituting \\(x = a\\),...",
         "p=3,\\quad q=2",
         ""
        ]
       ],
       "shape": {
        "columns": 6,
        "rows": 90
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>problem</th>\n",
       "      <th>solution</th>\n",
       "      <th>example_answer</th>\n",
       "      <th>reasoning</th>\n",
       "      <th>pred_answer</th>\n",
       "      <th>metric</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a cistern is two - third full of water . pipe a can fill the remai...</td>\n",
       "      <td>First, let's find out how much time it would take for each pipe to...</td>\n",
       "      <td>14.4</td>\n",
       "      <td>Given: - Cistern is two-thirds full initially. - Pipe A can fill t...</td>\n",
       "      <td>\\boxed{4.8 \\text{ minutes}} \\nor approximately 4 minutes and 48 se...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>In the geometric sequence $\\{a_n\\}$ with common ratio $q &gt; 1$, we ...</td>\n",
       "      <td>(1) Since $a_2 = 2$ and $S_3 = 7$ with $q &gt; 1$, We have the system...</td>\n",
       "      <td>\\frac{n}{n+1}</td>\n",
       "      <td>Given the geometric sequence \\(\\{a_n\\}\\) with common ratio \\(q &gt; 1...</td>\n",
       "      <td>(1) \\boxed{a_n = 2^{n-1}} \\; \\\\\\n(2) \\boxed{T_n = \\frac{1}{2} - \\f...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Given that the derivative of the function $f(x)$ is $f'(x)$, and i...</td>\n",
       "      <td>Given that the derivative of the function $f(x)$ is $f'(x)$, and i...</td>\n",
       "      <td>-\\dfrac{9}{4}</td>\n",
       "      <td>We are given that \\(f(x) = x^2 + 3x f'(2) + \\ln x\\). To find \\(f'(...</td>\n",
       "      <td>\\boxed{-\\frac{9}{4}}</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>67. Given a point $A$ and a line $l$, $B$ is an arbitrary point on...</td>\n",
       "      <td>67. The desired geometric locus of points consists of two lines pa...</td>\n",
       "      <td>notfound</td>\n",
       "      <td>Given a fixed point \\(A\\) and a line \\(l\\), and an arbitrary point...</td>\n",
       "      <td>The locus of points \\(M\\) is the two lines passing through \\(A\\) t...</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A unit has a total of 620 staff members. To investigate the time w...</td>\n",
       "      <td>**Analysis** This question examines systematic sampling, which is ...</td>\n",
       "      <td>394</td>\n",
       "      <td>The total number of staff members is 620. The staff was divided in...</td>\n",
       "      <td>394</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>Darnel sprinted 0.88 lap and then took a break by jogging 0.75 lap...</td>\n",
       "      <td>To find out how many laps farther Darnel sprinted than jogged, we ...</td>\n",
       "      <td>0.13</td>\n",
       "      <td>Darnel sprinted 0.88 lap and then jogged 0.75 lap. To find out how...</td>\n",
       "      <td>\\boxed{0.13}</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>In hexagon $FIGURE$, $\\angle F \\cong \\angle I \\cong \\angle U \\cong...</td>\n",
       "      <td>The sum of the angle measures in a hexagon is \\(180(6-2) = 720\\) d...</td>\n",
       "      <td>45^\\circ</td>\n",
       "      <td>The problem describes a hexagon labeled FIGURE with certain angle ...</td>\n",
       "      <td>45</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>A, B, C, and D enter into a partnership. A subscribes 1/3 of the c...</td>\n",
       "      <td>Let's denote the total capital as X. A subscribes 1/3 of the capit...</td>\n",
       "      <td>7/15</td>\n",
       "      <td>Let's denote the total capital as 1 (or 1 fraction). The capital s...</td>\n",
       "      <td>The fraction of the capital subscribed by B is \\boxed{0}.</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>At a laundromat, it costs a certain amount for a washer and a quar...</td>\n",
       "      <td>Let's denote the cost for a washer as \\( W \\). Samantha does 2 loa...</td>\n",
       "      <td>\\$4</td>\n",
       "      <td>Let the cost of using the washer be \\(w\\) dollars. Since each load...</td>\n",
       "      <td>\\boxed{4}</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>Find the real roots of the polynomial:\\n\\[ x^5 - 3x^4 + 3x^3 - x^2...</td>\n",
       "      <td>We attempt to factor the polynomial: \\begin{align*} x^5 - 3x^4 + 3...</td>\n",
       "      <td>-1 - \\sqrt{3}, -1 + \\sqrt{3}, -1, 1, 2</td>\n",
       "      <td>The polynomial is \\(x^5 - 3x^4 + 3x^3 - x^2 - 4x + 4\\). To find it...</td>\n",
       "      <td>\\boxed{-1,\\ 1,\\ 2}</td>\n",
       "      <td>✔️ [True]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>90 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                  problem  \\\n",
       "0   a cistern is two - third full of water . pipe a can fill the remai...   \n",
       "1   In the geometric sequence $\\{a_n\\}$ with common ratio $q > 1$, we ...   \n",
       "2   Given that the derivative of the function $f(x)$ is $f'(x)$, and i...   \n",
       "3   67. Given a point $A$ and a line $l$, $B$ is an arbitrary point on...   \n",
       "4   A unit has a total of 620 staff members. To investigate the time w...   \n",
       "..                                                                    ...   \n",
       "85  Darnel sprinted 0.88 lap and then took a break by jogging 0.75 lap...   \n",
       "86  In hexagon $FIGURE$, $\\angle F \\cong \\angle I \\cong \\angle U \\cong...   \n",
       "87  A, B, C, and D enter into a partnership. A subscribes 1/3 of the c...   \n",
       "88  At a laundromat, it costs a certain amount for a washer and a quar...   \n",
       "89  Find the real roots of the polynomial:\\n\\[ x^5 - 3x^4 + 3x^3 - x^2...   \n",
       "\n",
       "                                                                 solution  \\\n",
       "0   First, let's find out how much time it would take for each pipe to...   \n",
       "1   (1) Since $a_2 = 2$ and $S_3 = 7$ with $q > 1$, We have the system...   \n",
       "2   Given that the derivative of the function $f(x)$ is $f'(x)$, and i...   \n",
       "3   67. The desired geometric locus of points consists of two lines pa...   \n",
       "4   **Analysis** This question examines systematic sampling, which is ...   \n",
       "..                                                                    ...   \n",
       "85  To find out how many laps farther Darnel sprinted than jogged, we ...   \n",
       "86  The sum of the angle measures in a hexagon is \\(180(6-2) = 720\\) d...   \n",
       "87  Let's denote the total capital as X. A subscribes 1/3 of the capit...   \n",
       "88  Let's denote the cost for a washer as \\( W \\). Samantha does 2 loa...   \n",
       "89  We attempt to factor the polynomial: \\begin{align*} x^5 - 3x^4 + 3...   \n",
       "\n",
       "                            example_answer  \\\n",
       "0                                     14.4   \n",
       "1                            \\frac{n}{n+1}   \n",
       "2                            -\\dfrac{9}{4}   \n",
       "3                                 notfound   \n",
       "4                                      394   \n",
       "..                                     ...   \n",
       "85                                    0.13   \n",
       "86                                45^\\circ   \n",
       "87                                    7/15   \n",
       "88                                     \\$4   \n",
       "89  -1 - \\sqrt{3}, -1 + \\sqrt{3}, -1, 1, 2   \n",
       "\n",
       "                                                                reasoning  \\\n",
       "0   Given: - Cistern is two-thirds full initially. - Pipe A can fill t...   \n",
       "1   Given the geometric sequence \\(\\{a_n\\}\\) with common ratio \\(q > 1...   \n",
       "2   We are given that \\(f(x) = x^2 + 3x f'(2) + \\ln x\\). To find \\(f'(...   \n",
       "3   Given a fixed point \\(A\\) and a line \\(l\\), and an arbitrary point...   \n",
       "4   The total number of staff members is 620. The staff was divided in...   \n",
       "..                                                                    ...   \n",
       "85  Darnel sprinted 0.88 lap and then jogged 0.75 lap. To find out how...   \n",
       "86  The problem describes a hexagon labeled FIGURE with certain angle ...   \n",
       "87  Let's denote the total capital as 1 (or 1 fraction). The capital s...   \n",
       "88  Let the cost of using the washer be \\(w\\) dollars. Since each load...   \n",
       "89  The polynomial is \\(x^5 - 3x^4 + 3x^3 - x^2 - 4x + 4\\). To find it...   \n",
       "\n",
       "                                                              pred_answer  \\\n",
       "0   \\boxed{4.8 \\text{ minutes}} \\nor approximately 4 minutes and 48 se...   \n",
       "1   (1) \\boxed{a_n = 2^{n-1}} \\; \\\\\\n(2) \\boxed{T_n = \\frac{1}{2} - \\f...   \n",
       "2                                                    \\boxed{-\\frac{9}{4}}   \n",
       "3   The locus of points \\(M\\) is the two lines passing through \\(A\\) t...   \n",
       "4                                                                     394   \n",
       "..                                                                    ...   \n",
       "85                                                           \\boxed{0.13}   \n",
       "86                                                                     45   \n",
       "87              The fraction of the capital subscribed by B is \\boxed{0}.   \n",
       "88                                                              \\boxed{4}   \n",
       "89                                                     \\boxed{-1,\\ 1,\\ 2}   \n",
       "\n",
       "       metric  \n",
       "0              \n",
       "1              \n",
       "2   ✔️ [True]  \n",
       "3              \n",
       "4   ✔️ [True]  \n",
       "..        ...  \n",
       "85  ✔️ [True]  \n",
       "86  ✔️ [True]  \n",
       "87             \n",
       "88  ✔️ [True]  \n",
       "89  ✔️ [True]  \n",
       "\n",
       "[90 rows x 6 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "EvaluationResult(score=57.78, results=<list of 90 results>)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "evaluate(optimized_program)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "skmsf5j36v",
   "metadata": {},
   "source": [
    "### Understanding the Optimization Results\n",
    "\n",
    "**Performance Improvement:**\n",
    "- **Baseline Accuracy**: 52.2% (47/90 correct)\n",
    "- **Optimized Accuracy**: 57.8% (52/90 correct)\n",
    "- **Improvement**: +5.6 percentage points (~11% relative improvement)\n",
    "\n",
    "**What Changed:**\n",
    "See the instruction GEPA developed above.\n",
    "\n",
    "**Why the Modest Improvement?**\n",
    "\n",
    "The ~6% gain is expected given:\n",
    "1. **Small Training Set**: Only 112 training examples (0.025% of full dataset)\n",
    "2. **Light Optimization**: Using `auto=\"light\"` for faster iteration\n",
    "3. **Simple Baseline**: Chain-of-Thought already provides decent reasoning structure\n",
    "4. **Model Limitations**: GPT-4.1 Nano's mathematical capabilities are the ceiling\n",
    "\n",
    "**Cost Efficiency:**\n",
    "\n",
    "This entire experiment (baseline evaluation, GEPA optimization, and final evaluation on 224 examples) cost **less than $0.50** thanks to:\n",
    "- GPT-4.1 Nano's low pricing ($0.10/M input, $0.40/M output)\n",
    "- Asymmetric architecture (cheap model for 99% of calls, smart model for 1%)\n",
    "- Small sample size for demonstration purposes\n",
    "\n",
    "**Key Takeaway:**\n",
    "\n",
    "Even with limited data and light optimization, GEPA successfully identified failure patterns and generated targeted prompt improvements. With more training data (`sample_fraction=0.01` or higher) and heavier optimization (`auto=\"medium\"` or `\"heavy\"`), we'd expect 15-25% improvements, potentially reaching 65-70% accuracy."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cuj307bhp8f",
   "metadata": {},
   "source": [
    "## Learn More\n",
    "\n",
    "This notebook introduced DSPy's GEPA optimizer for automated prompt improvement. Here are additional resources to deepen your understanding:\n",
    "\n",
    "### DSPy Framework\n",
    "- **[DSPy Documentation](https://dspy.ai/)** - Official documentation and guides\n",
    "- **[DSPy GitHub Repository](https://github.com/stanfordnlp/dspy)** - Source code and examples\n",
    "- **[DSPy Research Paper](https://arxiv.org/abs/2310.03714)** - \"DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines\"\n",
    "- **[DSPy Tutorial Series](https://dspy.ai/learn/programming/)** - Step-by-step learning path\n",
    "\n",
    "### Prompt Optimization\n",
    "- **[GEPA Optimizer Documentation](https://dspy.ai/api/optimizers/GEPA/)** - Technical details on GEPA\n",
    "- **[Chain-of-Thought Prompting](https://arxiv.org/abs/2201.11903)** - Foundational paper on CoT reasoning\n",
    "- **[Automatic Prompt Engineering](https://arxiv.org/abs/2211.01910)** - \"Large Language Models Are Human-Level Prompt Engineers\"\n",
    "- **[DSPy Optimizers Comparison](https://dspy.ai/api/optimizers/)** - Overview of different optimization strategies\n",
    "\n",
    "### Mathematical Reasoning\n",
    "- **[NuminaMath Dataset](https://huggingface.co/datasets/AI-MO/NuminaMath-1.5)** - The dataset used in this notebook\n",
    "- **[GSM8K Dataset](https://huggingface.co/datasets/gsm8k)** - Grade school math word problems benchmark\n",
    "- **[MATH Dataset](https://huggingface.co/datasets/hendrycks/competition_math)** - Competition-level mathematics problems\n",
    "- **[Mathematical Reasoning with LLMs](https://arxiv.org/abs/2206.14858)** - Survey of techniques\n",
    "\n",
    "### Related Techniques\n",
    "- **[Few-Shot Learning](https://arxiv.org/abs/2005.14165)** - \"Language Models are Few-Shot Learners\" (GPT-3 paper)\n",
    "- **[Self-Consistency](https://arxiv.org/abs/2203.11171)** - Improving reasoning via multiple sampling paths\n",
    "- **[ReAct Prompting](https://arxiv.org/abs/2210.03629)** - Reasoning and Acting in language models\n",
    "\n",
    "### Tools and Platforms\n",
    "- **[OpenRouter](https://openrouter.ai/)** - Unified API for multiple LLM providers\n",
    "- **[Hugging Face Datasets](https://huggingface.co/docs/datasets/)** - Dataset loading and processing\n",
    "- **[DSPy Optimizers Guide](https://dspy.ai/deep-dive/optimizers/)** - Deep dive into optimization strategies"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "L4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "behrooz",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
